So here’s the thing about Unix. Everything is a file. Your config, your kernel modules, your running processes, even your block devices show up under /dev/ as files. Once that clicks, all the file commands in this post stop being trivia and start feeling like the keys to the whole system.
This is the file-manipulation companion to the text manipulation post. Every command here is one I’ve run this week. They’re boring, foundational, and the reason I can move around a server faster than I can move around the Finder on my Mac.

If you’re brand new, skim the top 10 Linux commands first. This post assumes you know how to
cd,ls, and read amanpage.
cp: copying files without surprises
cp copies files. The command is simple; the gotchas are in the flags.
# Copy a single file
cp /home/user/.bashrc /tmp/
# Copy a directory recursively (this is the flag people forget)
cp -R ./work /tmp/
# Preserve permissions, owner, and timestamps when copying
cp -a /etc/nginx /etc/nginx.bak
# Copy only the contents of a directory, not the directory itself
cp /work/* /tmp/
# Verbose: print every file as it copies
cp -v -R big-folder /tmp/
# Don't overwrite existing files
cp -n source.txt /tmp/
The flag I use most is -a (“archive”). It copies files while preserving everything: permissions, ownership, symlinks, modification times. When I’m taking a backup of a config directory before editing, cp -a is the only sensible choice. Plain cp -R resets owner to whoever ran the command, which is rarely what you want.
The -n flag (no clobber) has saved me from overwriting work-in-progress files more than once. If a destination file already exists, cp -n skips it instead of replacing it.
mv: rename, or move across filesystems
mv moves a file or directory. On the same filesystem, it’s just a rename: instant, no copy. Across filesystems (say from /home to a USB drive), it does a copy followed by a delete, which means it can be slow and can leave you with a partial move if it fails midway.
# Rename a file
mv old-name.txt new-name.txt
# Move a file to another directory
mv report.pdf ~/Documents/
# Move a directory (no -R needed; mv handles directories natively)
mv ./work /tmp/
# Move all files in current directory to /tmp/
mv ./* /tmp/
# Don't overwrite if destination exists
mv -n source.txt /tmp/
# Interactive: prompt before overwriting
mv -i source.txt /tmp/
A pattern I use weekly: rename a file with a timestamp suffix before editing. mv config.yaml config.yaml.$(date +%Y%m%d). That gives me a recoverable history without setting up a real backup tool. On servers I admin long-term, I keep an alias bk='cp -a $1 $1.$(date +%Y%m%d)' so making a backup before editing is one keystroke shorter than the editor command itself.
If you’re renaming many files at once with a pattern, mv doesn’t do that natively. Use rename (Perl-based, on most distros) or a for loop in bash.
rm: deleting files, with the warnings you should heed
rm removes files. Most people learn rm -rf once and never look back, which is fine until the day they typo a path and watch a year of work evaporate.
# Remove a file
rm temporary.txt
# Remove a directory and everything inside it
rm -r /tmp/old-build/
# Force remove (no confirmation, no permission checks)
rm -rf /tmp/old-build/
# Interactive: prompt before each delete
rm -i ~/Downloads/*.zip
# Verbose: print each filename as it's removed
rm -v *.log
Two things have saved me from rm -rf disasters over the years. First, I always type the path before the flags. rm /tmp/build first, then add -rf after I’ve checked the path. The opposite order (rm -rf then path) is how typos turn into bug reports. Second, modern rm ships with --no-preserve-root for a reason: by default, rm -rf / is blocked on most systems. The flag exists to override that block. Don’t override it.
A safer alternative on systems where you have it: trash-cli (brew install trash on Mac, sudo apt install trash-cli on Linux). It moves files to a trash folder you can restore from, instead of nuking them. I’ve aliased rm to trash on my personal laptop and never looked back.
mkdir: creating directories without the multi-step shuffle
mkdir creates directories. The single flag worth knowing is -p, which creates intermediate directories if they don’t exist:
# Make one directory
mkdir new_work
# Make a directory inside a path that exists
mkdir /tmp/new_work
# Make a directory along with any missing parents
mkdir -p /tmp/projects/2024/q3/notes
# Make multiple directories at once
mkdir docs scripts tests
# Make many at once with brace expansion
mkdir -p project/{src,tests,docs}/{python,go,rust}
That last brace-expansion trick creates 9 directories in one line. Once you’ve seen it, you’ll use it constantly. The brace expansion isn’t specific to mkdir; it works with any command, so cp file.txt{,.bak} will copy file.txt to file.txt.bak in one go.
chmod and chown: who can read, write, and run what
chmod (change mode) sets file permissions. chown (change owner) sets the user and group that owns a file. These two together are 90% of how Unix permissions work in practice.
Permissions come in three triplets: owner, group, others. Each triplet has three bits: read, write, execute. So rwxr-xr-- means: owner can read/write/execute; group can read/execute; others can only read.
You can set permissions in symbolic mode (u, g, o, a for the targets; r, w, x for the bits; +, -, = to add, remove, or set):
# Add execute permission for everyone
chmod a+x script.sh
# Remove write permission for "others"
chmod o-w secret.txt
# Set exact permissions: owner rwx, group rx, others r
chmod u=rwx,g=rx,o=r script.sh
# Recursively lock down ~/.ssh to owner-only
chmod -R u=rw,g=,o= ~/.ssh
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_*
Or in numeric mode, where each triplet becomes one octal digit (read=4, write=2, execute=1):
# 755 = rwxr-xr-x (typical for executables)
chmod 755 script.sh
# 644 = rw-r--r-- (typical for regular files)
chmod 644 README.md
# 600 = rw------- (private files, like SSH keys)
chmod 600 ~/.ssh/id_ed25519
# 700 = rwx------ (private directories, like ~/.ssh)
chmod 700 ~/.ssh
Memorise these four numbers and you’ve covered most real-world cases: 755 for executables, 644 for files, 700 for private dirs, 600 for private files. The full set of GNU chmod docs covers the edge cases.
chown is simpler. It sets the owner (and optionally the group) of a file. Most of the time you’ll need sudo because you’re handing ownership to a different user.
# Change owner to user "nick", keep current group
sudo chown nick file.txt
# Change owner AND group
sudo chown nick:admins file.txt
# Recursive on a directory (typical when fixing web-server permissions)
sudo chown -R www-data:www-data /var/www/site/
The web-server fix is one I run on every new VPS. Drop a WordPress or Astro build into /var/www, then chown -R www-data:www-data so nginx or Apache can actually serve it.
find: locating any file in any tree
find searches a directory tree by name, type, size, age, permission, or any combination. It’s the Swiss Army knife of file commands, and like a Swiss Army knife, the syntax is a little bit weird.
# Find by name in current directory tree
find . -name "*.log"
# Case-insensitive name match
find ~/ -iname "readme*"
# Find directories only
find / -type d -name "nginx"
# Find files only, larger than 100MB
find / -type f -size +100M
# Find files modified in the last 10 minutes
find / -mmin -10
# Find files modified in the last 7 days
find / -mtime -7
# Find empty files
find ~/ -type f -empty
# Find and DELETE (use carefully)
find /tmp/ -type f -name "*.tmp" -mtime +30 -delete
# Find and run a command on each result
find . -name "*.py" -exec wc -l {} \;
The -exec flag is the leverage. The {} is replaced with each match, and \; ends the command. So that last example runs wc -l against every Python file under the current directory.
If you find yourself doing complex pipelines with find, look at fd (a modern, faster alternative with saner defaults). I keep both installed: find for portability, fd for daily comfort.
A pattern I run weekly: cleaning up old build artefacts that pile up across projects. find ~/projects -type d -name node_modules -prune -exec rm -rf {} \; will recursively remove every node_modules folder under ~/projects. The -prune flag stops find from recursing into them, which makes it dramatically faster on a machine with hundreds of nested dependency trees. I’ve reclaimed 80GB of disk in one shot more than once with that command.
Another favourite: find . -type f -size +500M to spot the giant files I’ve accidentally committed or downloaded. The output is short, the surprise is usually large.
gzip, tar, and zip: turning many files into one
The compression toolbox on Linux has three commands you’ll use weekly: gzip for single-file compression, tar for bundling many files into one archive, and zip for cross-platform sharing.
# gzip: compress a file (replaces the original with file.gz)
gzip large.log
# Decompress
gzip -d large.log.gz
# or equivalently
gunzip large.log.gz
# Keep the original file when compressing
gzip -k large.log
# Maximum compression (slow, but smaller)
gzip -9 large.log
tar is the one most people remember the flags for: cvf to create, xvf to extract, tvf to list. The z flag adds gzip compression on the fly:
# Create a tar archive
tar -cvf backup.tar /home/hemant/projects/
# Create a gzipped tar archive (the .tar.gz format you see everywhere)
tar -czvf backup.tar.gz /home/hemant/projects/
# Extract a tar archive
tar -xvf backup.tar
# Extract a gzipped tar archive
tar -xzvf backup.tar.gz
# List the contents without extracting
tar -tvf backup.tar.gz
# Extract into a specific directory
tar -xzvf backup.tar.gz -C /tmp/restore/
I keep cvzf and xzvf in muscle memory. c create, x extract, z gzip, v verbose, f filename. The order doesn’t actually matter, but having a fixed order means you type it without thinking. The GNU tar manual has the rest.
zip and unzip aren’t installed by default on every distro, but they’re the right choice when you’re sharing with someone on Windows or Mac:
# Install (Debian/Ubuntu)
sudo apt install zip unzip
# Create a zip archive
zip archive.zip file1.txt file2.txt file3.txt
# Add a whole directory recursively
zip -r archive.zip ./project/
# Extract
unzip archive.zip
# List contents without extracting
unzip -l archive.zip
Quick comparison of compression levels and speeds, from the actual run on a 1.2GB log file:
| Format | Size after | Time |
|---|---|---|
| Original | 1,234 MB | — |
| gzip (default) | 142 MB | 18s |
| gzip -9 | 138 MB | 41s |
| tar.gz | 142 MB | 19s |
| zip | 145 MB | 22s |
For everyday use, default gzip is the sweet spot. -9 rarely justifies the time. If you need real compression ratios, look at zstd or xz. zstd in particular gives you near-xz compression at near-gzip speed, which is why most modern Linux distros now ship kernel images compressed with it.
Frequently asked questions
What’s the difference between cp -R and cp -a?
cp -R recursively copies files but resets ownership and timestamps to the user running the command. cp -a (archive) preserves everything: permissions, ownership, symlinks, and modification times. For backups or system-config copies, cp -a is almost always what you want.
How do I undo rm?
You can’t, in the general case. Once a file is unlinked from the filesystem and overwritten, it’s gone. That’s why I recommend aliasing rm to trash-cli on personal machines, and double-checking the path on production servers. If you’ve just rm’d something on an ext4 filesystem and haven’t written anything since, tools like extundelete may recover it, but treat that as a long shot, not a strategy.
What does chmod 777 actually mean, and why shouldn’t I use it?
777 gives every user on the system read, write, and execute permission on the file. It’s the universal “make this work right now” answer that often introduces real security holes. If your web app can’t read a config file, the right fix is to figure out which user the web server runs as and chown the file to it, not to make it world-writable.
When should I use tar.gz versus zip?
If you’re staying inside Linux/macOS, use tar.gz (or tar.zst). It preserves Unix file permissions and symlinks. If you’re sharing with a Windows user who’s going to double-click it in Explorer, use zip because Windows handles it natively without extra software.
Why does find need that weird \; at the end of -exec?
The semicolon ends the command being executed; the backslash escapes the semicolon so the shell doesn’t interpret it as a command separator. It’s an old syntax inherited from POSIX. The newer find ... -exec command {} + form (with + instead of \;) batches arguments together for a faster run, similar to how xargs works.
That covers the file commands I touch every day. If you missed it, the text manipulation post covers grep, sed, sort, and friends, and the system-monitoring post covers top, free, df, and du. Together with the basics from the top 10 commands post, you’ve got most of a working Unix shell vocabulary.
The single biggest jump in my own command-line speed came from accepting that the GUI would always feel “more obvious” but always be slower for anything I did more than once. Once cp -a, find -name, and tar -czvf are reflexes, you stop noticing the terminal at all and start noticing the friction of every other tool you use.
Last updated: August 2024