Systems Stuff: Linux Servers, OMG

intro

this tutorial/whatever is primarily about using a linux server via the command line

specifically it is about debian servers, but the majority of these concepts relate to all linux distros. ubuntu is based on debian, so many of the concepts will be directly applicable.

this is a long tutorial, haha

why should you even read this? well, if you want to learn how to use linux properly, and maybe set up your own web server or development box, or a file server, or who knows...

why use linux? why command line?

linux is built to be free, open source, stable, and expandable.

linux has drastically less overhead than any other operating system. so if performance is key, as it always is on a server, then linux is almost always the best option.

real players use the command line, and only the command line

GUIs are for babies

really though, when you use a GUI, the computer is just wasting CPU cycles and memory to render the GUI to "make life easier for you". why do this?

you are too cool for this now. if you have a mouse, disconnect it. you don't need it anymore once you've entered a CLI.

in this tutorial, when you see this:

> pwd

that means you're doing something at the command prompt! specifically, that means type pwd at the prompt and hit enter. wow, omg, amazing...

before anything, you need a server

follow this if you don't have a server to play on. if you do, skip to the next section.

ok so we'll make one, both of us together, check this out.

download VirtualBox and debian, specifically the latest CD Image for i386. you can download via bittorrent if you want. just click here to go to the debian CD download page, click on i386, and download CD-1.

it's 700MB so it might take a little while to download. alternatively, you could use the netinst version, but I always prefer to have a full CD image.

ANYWAY... when you have those two things downloaded, install Virtual Box if you haven't already. VB is pretty great; it's a free desktop virtualization package. what does that mean? basically, instead of having another physical computer sitting at your desk, you can create a virtual one on your current computer. it's goddamn magic.

HIT THE "NEW" BUTTON... type in a name for the machine... set the OS to LINUX, set the version to Debian. Hit continue.

give it 512MB of RAM, that should be plenty. it'll probably have a default hard disk of 8GB in size. that's fine. go through that second DISK WIZARD thing, just keep hitting next, the defaults are fine.

when that's done, you'll have a new virtual machine sitting there. you'll see its stats on the right side of the window. magical.

now to install debian on it. right-click on the machine name and click SETTINGS, then STORAGE, then under IDE Controller there should be a little CD icon with "Empty" next to it. Click on "Empty". Click on the little disc icon on the right side, next to the dropdown, and then click on "Choose a virtual blah blah". Go to wherever you saved that debian disk image and select it.

hit ok to get out of settings and now you're ready to boot up! CLICK ON START!

you should see the debian installer boot up. use it. love it. most of the steps are obvious. steps you need to care about:

host name: choose whatever name you want, I guess
root password: again, whatever you want, but don't forget it
first user: irrelevant, you won't be using it, so do whatever
as for what debian wants to do with its disk, just use the defaults
if you want a network mirror, hit yes. and choose one. doesn't matter. I recommend the mit.edu one.
when you get to SOFTWARE SELECTION, select only SSH Server and Standard System Utilities. DESELECT all others. use the space bar to do so.
use the default option for everything else. it'll reboot when it's done.

when it's done, you'll see a login prompt! congrats! now we need to find out it's IP so you can remotely connect to it. or you can just log in from here and continue.

to find out it's IP log in using the root username and password. then type

> ifconfig

which will show you the current internet interfaces on the machine. look to the right of eth0 at its inet addr field. that's the IP!

(NOTE: If you get an IP address that starts with 10.whatever then you will probably have to change the VM's settings from NAT to Bridged under the Network options, and then make sure you get an IP that starts with 192.168.whatever. To do this, just type poweroff into the prompt, then edit the VM's settings as I just said, then power it back on and check its IP.)

you're all set with your virtual server. keep that IP address handy. logout by typing

> exit

and now we can move on!

ok so you have a server, cool

I'm assuming you have its IP or hostname or whatever.

ok so now you need to control it. you have a few options.

if SSH is installed on the server... you can connect to it remotely! this is how you'd normally connect to a server. the server is in a datacenter somewhere and you need to get to it from your desk or at home. if you just made a virtual server, pretend it's not there, pretend it's 1000 miles away.

if you do NOT have ssh installed, then you'll have to use the server's command line. with a keyboard in front of the machine. gross. regardless, the commands are the same.

SSH means Secure Shell, and it's the most common way to connect to a linux server. it's very secure and very simple.

if you are on Mac, open up Terminal.app

if you are on PC, download PuTTY (gross)

for putty, put in the IP address or hostname into the "Host Name" field and then hit enter. Login as root. Put in the password. You're in.

for Terminal, type in this, and replace [server/IP] with either the hostname or the IP of the server.

> ssh root@[server/IP]

(for example, me connecting to my own machine would be ssh root@127.0.0.1 or ssh root@localhost)

after connecting via ssh, you should be in! you might need to hit "YES" to some security warning.

a note about root... "root" is the super awesome admin alpha omega user. it's the most powerful. it has access to everything. it can do anything. you are basically god to this linux machine. that's what root means. if you logged in as another user, you'd probably not have permission to do a lot of things. that's fine sometimes, but root can do anything.

you're totally in

now to learn about what this even is! it's crazy!

omg look at the prompt! it probably looks like this:

debianbox:~#

what is that craziness!? what is that :~# ???

well, the first part, debianbox, will be the hostname of the machine you are logged in to. it should be the same as whatever you named the debian install.

the colon after it just separates that from the rest of the prompt

the squiggly line ~ is your current working directory. in this case, you usually start at your user's home, which is represented by that character.

if i change my directory to /usr/src, the prompt would look like this all of a sudden:

debianbox:/usr/src#

the # signifies that you are the root user right now. usually it'll show a $ there if you are not root.

the file system: this is where you live now

in a GUI, you as a user "live" inside the GUI... you see windows and applications and menu bars and stuff.

there is no such stuff in a CLI. instead, you live inside the file system of the machine. you always exist inside a directory or a running application.

on windows, the file system looks like this: C:\Windows\whatevs with "Windows" being a directory on the "C:\" drive and "whatevs" being a subdirectory of "Windows"

on linux, the file system looks like this: /usr/bin/whatevs with / being the lowest-level of the file system, "usr" being a directory inside "/", and "whatevs" being a subdirectory of "usr"

the / directory, the lowest level of the file system, is commonly referred to as the root directory... so you are the root user and there's a root directory. common theme.

basically, Windows bases a lot on disks. Linux bases itself on a file system. it's much more abstract and beneficial in the long run.

running commands!

so now you should have a grasp of what you're in and where you are.

you run commands simply by typing them in the prompt... let's try it!

> pwd

Do it! type that and hit enter. the whole thing should look like this:

debianbox:~# pwd
/root
debianbox:~#

so what just happened? well, first of all, pwd is a command that prints the working directory and it returned /root, which is the home folder of the root user. that is where you are right now!

if you set up your own virtual box server, you ran the ifconfig command successfully to get the machine's IP.

every linux install has hundreds of commands built into it. these commands are merely programs living on the machine somewhere. want to know where the pwd program lives?

> which pwd

that should return something like /bin/pwd, which is where the pwd program lives. it's in the /bin folder! let's go there and see what other programs are stored there!

> cd /bin

the cd command changes the working directory to whatever you put after it. in this case, we want to go to /bin

so now your prompt should reflect that we're in /bin by looking like this:

debianbox:/bin#

magic. now let's see what is in this directory! to do so, we use the ls command

> ls

that'll give us a multicolumn list of the files in this directory, including subdirectories. but it's not really that great to look at... so let's try adding some options to the command to change what it outputs. try this:

> ls -l

that should give you a big detailed list, a lot of information is here. but let's not worry about that yet. what you just did was add a command switch, specifically the -l, to the ls command. there are even more you can add!

> ls -lah

don't think of that -lah as a word... each letter signifies a different switch. in this case, the l means list, the a means all, and the h means human-readable. if you look at the output of that command, you'll see that some of the numbers now have Ks next to them (this means Kilobytes), which shows you the size of the individual files. The "all" option shows you hidden files.

the first character, the minus sign before them, signifies that the following is a switch. conversely, you could have written the same command this way:

> ls -l -a -h

and it'll do the same thing! but you could easily bundle them all together after one minus sign.

just for kicks, let's say you want to get the list of files in another directory than the one you're currently in. try this:

> ls -lah /root

you added an input at the end of it, being the directory /root... so this command does what you should expect: list all the files in human-readable format for the directory /root. MAGIC.

that last piece, in this case /root, is sometimes referred to as an argument or parameter or input

how to learn about how to learn commands

on debian, and most linux distros, there's a wonderful command called man and it will teach you all about commands! try this:

> man pwd

you will now be on the man page (manual page) for the pwd command! This will show you what it is, how to use it, what switches and arguments you can throw at it, etc.

press q to get out of it and go back to the prompt.

I will always argue that knowing how to find out is more powerful than memorizing commands. now you know how to find out. if you ever encounter a command you are unsure about, try reading its man page first.

most commands go like this: command [switches] [input] [output] as I've described before. here's a quick example that shows all of this. don't do this, but read it:

> cp -r /root /lol

so what's going on here?

cp is the command, and it means copy
-r is a switch or option, and it means recursive, or "everything"
/root is the input, or what you are copying FROM
/lol is the output, or what you are copying TO

so that command copies all the files (recursively, so every subdirectory, too) of the /root directory to the /lol directory.

if you actually tried to run that, it would probably error out because there is no /lol directory.

the coolest things about the shell

so what you're in right now, the prompt, the file system, is also known as a shell. just so you know.

here are some AWESOME, INSANE tips on using the shell:

the tab key is your best friend

it acts as an autocompleter for commands and directories!
try this:

> nets (NOW HIT TAB) -> it autofills it to netstat

> cd /usr/sr (NOW HIT TAB) -> it autofills it to /usr/src

also, if you hit tab TWICE, it'll show you a list of possibilities!

> ne (hit tab twice) -> spits out a list of commands that start with "ne"

> cd /usr/ (hit tab twice) -> spits out a list of directories inside /usr

up and down keys go back and forth through your command history. hit the up key to see what you've run so far.
CTRL+C allows you to force-quit (kill) whatever process is currently happening on your shell. this is useful if you're stuck waiting for something to happen and it's obviously broken.
CTRL+R allows you to type and search through your previous commands

now for some useful commands!

the basics

ls -lah (shows you a detailed list of all files in a directory with human-readable file sizes)
cd [where] (changes the current directory you're in)
mkdir [new directory name] (makes directories)
rm or rmdir [file or directory] (deletes files and/or directories. be careful with rm -r because it deletes things RECURSIVELY (meaning an entire directory tree! gone!))
ln -s [target file] [new file] (ln alone makes hard links, ln -s makes symbolic links, which is like a shortcut... "links" are something I might explain to you later)
mv [target file] [new file location] (moves and/or renames a file)
cp [target file] [new file location] (copies and/or renames a file)
less [filename here] (opens text file for viewing; there's also one called "more", and "cat", which do similar)
exit (logs you out)
shutdown -h now (shuts down NOW! warning: you won't be able to log back in until you physically power it back on!)
reboot (reboots.)

a little more than basic

top (like the task manager in windows, press 'h' while open to see commands)
ps (prints a list of stuff that's running, like top, but not as pretty)
man [command name here] (described above)
chmod [lots of options here] (changes ownership/permissions on files)
apt-get and apt-cache (apt-get is the primary means of installing stuff... apt-cache lets you search for things to install)
wget [http://wherever.com/lol.jpg] (get a file from the web)
scp [lots of options here] (get or send files to and from other servers via SSH)
vi or vim or nano [filename here] (text editors, I prefer nano, but most hardcore people prefer vi/vim)
ifconfig (shows current network interfaces, like ipconfig)
ping [hostname/IP] and tracert [hostname/IP] (network connection diagnostics)

even further above basic

find (finds shit, though the syntax for using it might be confusing)
grep (finds strings inside files, you'll need to learn regular expressions to really get the power)
tail (lets you view a text file as it's updated in near real-time, useful for logs)
telnet (useful for testing ports)
netstat (shows you network connections currently on this machine)
df -h (shows you current disk space available to different partitions of the filesystem)

There are a lot more I'm probably forgetting... I won't get into the really advanced ones.

I'm not really going to get into the nitty-gritty stuff like piping and stdout/stderr

But be aware: the most powerful single-line command sequences involve piping and redirecting the "flow" of data. A simple example:

> ps -e | grep apache

That'll return all the instances of "apache" currently running, and it does so by piping (the | character) the output of ps -e into the input of the grep apache command.

Combine this shit with regular expressions and bash scripting and shit gets crazy!

some cool places to be in the filesystem

here are some useful locations:

/var/log - has a lot of logs in it! very useful.
/etc - holds the configuration files for many programs and services
/home - a lot like the "Documents and Settings" folder on Windows, or the "Users" folder on a Mac
/root - the root user's home (~) directory
/bin, /sbin, ... - these are a few locations where shell command programs are stored
/tmp - temporary files, deleted regularly by the system
/var/www - the common place for a web server to keep its files
/usr/src - a good place to put source files whenever you need to manually compile a program
the rest of the directories in / are pretty much like whatever, they're ok i guess

daemons, aka services

There are many "background services" in linux, usually installed as packages, or they come with the system. These are frequently called "daemons".

you can see them when you run top, and there's always a lot of stuff running in the background.

a lot of these services are controlled by scripts inside of the /etc/init.d/ directory

for example, Apache is an extremely popular web server daemon. It is controlled on a server by the /etc/init.d/apache2 script. If you were to type...

> /etc/init.d/apache2 restart

The service would be restarted! Magic! The server goes down, and for a brief second nobody can access the website, and then it comes back up! (Hopefully.)

Every single script inside /etc/init.d/ must conform to the standard of having the start/stop/restart options.

Most scripts also have the very valuable reload option, which means you don't actually have to stop the service to make a configuration change.

So, for example, let's say we make a change to apache's config files and we need to roll it out immediately. If we were to do a restart, it would cut off everyone currently on the site from trying to access anything. (Nobody might notice since it could only take a second to restart, but it's not good policy.) Instead, we would simply use the reload command instead to commit the config changes, and it would then quietly recycle the current connections when appropriate.

Services like DNS, DHCP, etc, all run like this. A lot of linux server maintenance is restarting/reloading services when needed. For example, apache's main configuration file is /etc/apache2/apache2.conf, which is just a text file.

logs

As a brief note, logs are very important in a linux system. In a GUI, you'd normally get a pop-up or other kind of alert when something goes wrong. Most of the time in the command line, you'd similarly get an immediate error when running a command, which is great. But what if a service goes down? It's running in the background and won't interrupt you. So it'll usually have an error log to explain what happened.

To continue with the apache example, apache has an error log in the /var/log/apache2 folder. That's one of the few ways you'll figure out what went wrong.

syslog and messages are system-wide logs that have a lot of information. individual services will typically have log files inside /var/log.

they're all just text files, except if they end in .gz (those are old, archived logs), so you can open them with the less command and look through them. try it!

> less /var/log/syslog

Look at all that info! Each line should begin with the time the event happened. Press q to quit out and return to the prompt.

packages! installing stuff!

On Windows and Mac, installing things is typically pretty straightforward. It's also pretty straightforward for the most part on linux. You're going to use two tools: apt-get and apt-cache.

apt-get installs, removes, and upgrades packages. It keeps track of what is installed on your system. For example, let's say you want to install the Apache web server:

> apt-get install apache2

That command will find the "apache2" package, figure out its dependencies, install all of the necessary files (from CD or the internet), and start the service. That simple. You'll probably need to press y when it asks you whether it's okay to use whatever disk space it needs.

Note: every once in awhile you should run apt-get update, which fetches the latest list of packages from the internet. (If you have a network mirror for apt-get set up, which you should.)

But what do you do if you don't know what package to install? That's where apt-cache comes into play. Try this:

> apt-cache search apache2

That'll print off a whole long list of packages associated with apache2. Now try this:

> apt-cache show apache2

That'll show you the specific information for the package called "apache2". What it does, what it requires, what you might want to install with it, etc.

apt-get upgrade will upgrade any existing packages to their latest versions. While this might seem like a great idea, keep in mind that you need to make sure you back up configuration files in case the upgrade breaks things.

compiling from source, or, the coolest of the cool

One of the coolest things about linux is that you may need to compile a program from its source files in order to get it working. What does this mean, exactly?

The vast majority of programs on a computer are binary files that are written in machine code so a computer can understand how to execute them. However, rarely do people write the programs in machine code! That'd be crazy! Instead they write them in languages like C or C++, and then they have to compile those source files into a binary program the machine will understand.

So why would you need to do this yourself? Well, a pillar of linux is that it's adaptable, but in order to be adaptable to different environments it usually needs you to do extra work as a user.

On Windows, you never need to compile from source because the platform of Windows is built on standards that cannot be deviated from, like having an x86 processor. Linux, on the other hand, assumes very little, so that it can try to get the maximum amount of performance out of whatever hardware it's on. Windows would rather stack layers upon layers of APIs and services to do the work for you, which just bogs down the system.

Therefore most linux programs won't have one binary which will work universally everywhere on every platform! Instead, you need to do the work of compiling it yourself so that it works correctly and efficiently.

Please note that if the program you're compiling is available on apt-get, it's always better to install it that way. Compiling from source is kind of a last resort. Apt-get will install a binary that it already knows will work on your platform.

Before compiling, make sure your linux install has the necessary files to do so. Usually this means you need to install the build-essential package, like this:

> apt-get install build-essential

That'll make sure you have all the programs necessary to compile from source.

Anyway, basically you get the source files from the web, and you usually have to run these three commands, one after another:

> ./configure
> make
> make install

The first command runs a script that is typically bundled with the source code. This "configure" script gets all the information about your platform and writes this info to a file.

The make command takes that file and actually builds the binary files from the source files. This usually takes awhile! You'll probably see massive amounts of crazy text go by as it does this. Make yourself some tea while it does.

The last command simply takes those finished binary files and moves them to a common place where they can be used in the shell alongside all the other commands you use.

Some things that are fun to compile from source: ffmpeg, node.js, php. Another benefit gained by compiling from source is you can typically get the most bleeding-edge version of these programs. However, they are not tracked by apt-get, so if you compile php from source and then try to apt-get install php, it'll override your compiled version! So keep this in mind.

node.js, for example, simply isn't available as a package, so you have to compile from source if you want to use it.

in conclusion

There is so much to do with a linux system, so little time to do it all! Really, learning is best achieved through fucking up. Breaking things. That's the beauty of having a virtual server, you can break it and start fresh anytime. Take chances!

I suggest trying the following:

Set up a simple LAMP web server. That's Linux, Apache, MySQL, and PHP.
Build a node.js development box.
Start a simple FTP or Samba file server.
Learn Ruby or Python via the command line.

if you have any questions, email me! if you think there's anything i've critically missed, email me! cyle_gage@emerson.edu