Pass out handout
Tools Installation: * GitBash Installation Windows * MacOS use of Terminal
Introduction to lesson resources:
Downloading data files for lesson: http://github.com/ucsdlib/libcarp-data-notes/
The data: * Run git clone https://github.com/ucsdlib/libcarp-data-notes/ * Open and show data files that will be used in lesson * Unzip 2014-01_JA.tsv.zip in libcarp-data-notes folder
Get the data at:
git clone https://github.com/ucsdlib/libcarp-data-notes/
Cloning into 'libcarp-data-notes'...
remote: Counting objects: 55, done.[K
remote: Compressing objects: 100% (42/42), done.[K
remote: Total 55 (delta 19), reused 33 (delta 8), pack-reused 0[K
Unpacking objects: 100% (55/55), done.
Checking connectivity... done.
In this session we will introduce programming by looking at how data can be manipulated, counted, and mined using the Unix shell, a command line interface or CLI to your computer and the files to which it has access.
The shell is one of the most productive programming environments ever created.
A Unix shell is a command-line interpreter that provides a user interface for:
For Windows users, the shell program such as Git Bash provide a Unix-like interface.
The motivations for wanting to learn shell commands are many and various.
To give you some idea of what the shell can do, I will demonstrate how to find the number of articles published in 2009 in academic history journals whose title contains the word ‘International’.
First, let’s use a shell command to see how big this file is.
Type:
wc -l ~/desktop/libcarp-data-notes/data/2014-01_JA.tsv
507732 /Users/rotsuji/desktop/libcarp-data-notes/data/2014-01_JA.tsv
The shell command wc with the argument -l (for lines) tells us this is a 500,000 line data file. * Excel will struggle to manipulate that, but the shell won’t.
To introduce the shell, let’s look at a complete string of shell commands to: * find the number of articles published in 2009 in academic history journals whose title contains the word ‘International’ from the 500000 line data file.
grep 2009 ~/desktop/libcarp-data-notes/data/2014-01_JA.tsv | grep INTERNATIONAL | awk -F'\t' '{print $5}' | sort | uniq -c
40 AFRICA -LONDON- INTERNATIONAL AFRICAN INSTITUTE-
1 ARCHAEOLOGY ETHNOLOGY AND ANTHROPOLOGY OF EURASIA
34 AUSTRALIAN JOURNAL OF INTERNATIONAL AFFAIRS
947 BAR INTERNATIONAL SERIES
105 CHINA REVIEW INTERNATIONAL
70 FOREIGN POLICY -WASHINGTON-
13 GESTA
38 INDONESIAN QUARTERLY
2 INTERNATIONAL AFRICAN LIBRARY
274 INTERNATIONAL HISTORY REVIEW
80 INTERNATIONAL JOURNAL OF AFRICAN HISTORICAL STUDIES
17 INTERNATIONAL JOURNAL OF AFRICAN RENAISSANCE STUDIES
23 INTERNATIONAL JOURNAL OF CONTEMPORARY IRAQI STUDIES
16 INTERNATIONAL JOURNAL OF HISTORICAL ARCHAEOLOGY
13 INTERNATIONAL JOURNAL OF IBERIAN STUDIES
217 INTERNATIONAL JOURNAL OF MARITIME HISTORY
169 INTERNATIONAL JOURNAL OF MIDDLE EAST STUDIES
95 INTERNATIONAL JOURNAL OF NAUTICAL ARCHAEOLOGY
70 INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
25 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
55 INTERNATIONAL JOURNAL OF THE CLASSICAL TRADITION
46 INTERNATIONAL REVIEW OF SOCIAL HISTORY
42 INTERNATIONALES ASIENFORUM
28 JEUNE AFRIQUE
1 JOURNAL OF AFRICAN AMERICAN HISTORY
31 JOURNAL OF CONTEMPORARY AFRICAN STUDIES
1 JOURNAL OF MODERN AFRICAN STUDIES
2 RESEARCH IN INTERNATIONAL STUDIES SOUTHEAST ASIA SERIES
1 REVOLUTIONARY RUSSIA
grep 2009 ~/desktop/libcarp-data-notes/data/2014-01_JA.tsv | head -10
History_1a-rdf.tsv Harnish, D. 2 26 ASIAN MUSIC 0044-9202 (Uk)RN000200906 ASIAN MUSIC 26(2), 157-159. (1995) Cilokaq Music of Lombok xxu eng SOCIETY FOR ASIAN MUSIC, INC. 1995
History_1a-rdf.tsv Arana, M. 2 26 ASIAN MUSIC 0044-9202 (Uk)RN000200918 ASIAN MUSIC 26(2), 160-162. (1995) Eternal Voices: Traditional Vietnamese Music in the United States xxu eng SOCIETY FOR ASIAN MUSIC, INC. 1995
History_1a-rdf.tsv Ming-mei, Y. 2 26 ASIAN MUSIC 0044-9202 (Uk)RN000200920 ASIAN MUSIC 26(2), 163-165. (1995) Soul of China - Guqin Recital by Professor LI Xiang Ting xxu eng SOCIETY FOR ASIAN MUSIC, INC. 1995
History_1a-rdf.tsv Manuel, P. 2 26 ASIAN MUSIC 0044-9202 (Uk)RN000200931 ASIAN MUSIC 26(2), 166-167. (1995) Retooling a Tradition: A Rajasthani Puppet Takes Umbrage at His Stringholders xxu eng SOCIETY FOR ASIAN MUSIC, INC. 1995
History_1a-rdf.tsv Rawnlsey, G. D. 3 9 CONTEMPORARY RECORD 0950-9224 (Uk)RN002372009 CONTEMPORARY RECORD 9(3), 586-601. (1995) How Special is Special? The Anglo-American Alliance During the Cuban Missile Crisis xxk eng FRANK CASS LONDON 1995
History_1a-rdf.tsv Kraus, H.-C. 1 262 HISTORISCHE ZEITSCHRIFT 0018-2613 (Uk)RN003512009 HISTORISCHE ZEITSCHRIFT 262(1), 162-163. (1996) J. J. Sack, From Jacobite to Conservative. Reaction and Orthodoxy in Britain, c. 1760-1832 gw eng BOEHLAU VERLAG HISTORISCHE ANTHROPOLOGIE 1996
History_1a-rdf.tsv Frantz, J. B. 1 53 WILLIAM AND MARY QUARTERLY 0043-5597 (Uk)RN003732009 WILLIAM AND MARY QUARTERLY 53(1), 216-218. (1996) Griffin, Revolution and Religion: American Revolutionary War and the Reformed Clergy xxu eng INSTITUTE OF EARLY AMERICAN HISTORY AND CULTURE 1996
History_1a-rdf.tsv Markoff, I. 2 29 MIDDLE EAST STUDIES ASSOCIATION BULLETIN (Uk)RN005420090 MIDDLE EAST STUDIES ASSOCIATION BULLETIN 29(2), 157-160. (1995) Introduction to Sufi Music and Ritual in Turkey xxk eng THE MIDDLE EAST STUDIES ASSOCIATION OF NORTH AMERICA, INC. 1995
History_1a-rdf.tsv Keefe, S. E. 1 80 GEORGIA HISTORICAL QUARTERLY 0016-8297 (Uk)RN008662009 GEORGIA HISTORICAL QUARTERLY 80(1), 206-207. (1996) Loftin, Healing Hands xxu eng UNIVERSITY OF GEORGIA 1996
History_1a-rdf.tsv Downes, P. 4 29 EIGHTEENTH CENTURY STUDIES 0013-2586 (Uk)RN011720092 EIGHTEENTH CENTURY STUDIES 29(4), 413-432. (1996) Sleep-Walking Out of the Revolution: Brown's Edgar Huntly xxu eng THE JOHNS HOPKINS UNIVERSITY PRESS 1996
grep 2009 ~/desktop/libcarp-data-notes/data/2014-01_JA.tsv | grep INTERNATIONAL | head -10
History_1b-rdf.tsv Cherry, S. 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200927951 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 7-30. (2006) Medicine and rural health care in nineteenth and early twentieth century Europe xxk eng lincoln 2006
History_1b-rdf.tsv Barona, J. L. 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200927966 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 31-51. (2006) Rural health, medicine and cultural interaction in late nineteenth and early twentieth century Spain xxk eng lincoln 2006
History_1b-rdf.tsv Andresen, A. 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200927976 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 52-72. (2006) Towards equality? Rural health and health legislation in Norway, 1860-1912 xxk eng lincoln 2006
History_1b-rdf.tsv Cherry, S. 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200927987 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 73-91. (2006) Zemstvo health care in rural Russia c. 1864-1917 xxk eng lincoln 2006
History_1b-rdf.tsv Farr, I. 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200927990 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 92-108. (2006) Medical topographies in nineteenth century Bavaria xxk eng lincoln 2006
History_1b-rdf.tsv Williamson, T. 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200928000 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 109-122. (2006) The disappearance of malaria from the East Anglian Fens xxk eng lincoln 2006
History_1b-rdf.tsv Moll, I. 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200928016 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 123-137. (2006) Health networks in nineteenth century rural Majorca xxk eng lincoln 2006
History_1b-rdf.tsv Andresen, A. 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200928026 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 138-157. (2006) Perspectives on the interaction of medicine and rural cultures: Spain, Norway and European Russia, 1860 - 1910 xxk eng lincoln 2006
History_1b-rdf.tsv 2 2 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 1750-0478 (Uk)RN200928037 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES 2(2), 158-160. (2006) I.Borowy and W.D. Gruner (eds) Facing illness in troubled times: health in Europe in the interwar years 1918-1939 xxk eng lincoln 2006
History_1b-rdf.tsv Piliang, I.J. 3 35 INDONESIAN QUARTERLY 0304-2170 (Uk)RN224067136 INDONESIAN QUARTERLY 35(3), 206-220. (2007) Seeking for Victory, Not Biting The Dust: Political Parties' Situation Prior to 2009 General Elections io eng CENTRE FOR STRATEGIC AND INTERNATIONAL STUDIES 2007
grep 2009 ~/desktop/libcarp-data-notes/data/2014-01_JA.tsv | grep INTERNATIONAL | awk -F'\t' '{print $5}' | head -20
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
INDONESIAN QUARTERLY
INDONESIAN QUARTERLY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
grep 2009 ~/desktop/lc-data/libcarp-data-notes/shell-data/2014-02-01_JA-art.tsv | grep INTERNATIONAL | awk -F'\t' '{print $5}' | sort
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
BAR INTERNATIONAL SERIES
CHINA REVIEW INTERNATIONAL
CHINA REVIEW INTERNATIONAL
CHINA REVIEW INTERNATIONAL
CHINA REVIEW INTERNATIONAL
CHINA REVIEW INTERNATIONAL
CHINA REVIEW INTERNATIONAL
CHINA REVIEW INTERNATIONAL
FOREIGN POLICY -WASHINGTON-
INTERNATIONAL JOURNAL OF AFRICAN HISTORICAL STUDIES
INTERNATIONAL JOURNAL OF AFRICAN HISTORICAL STUDIES
INTERNATIONAL JOURNAL OF CONTEMPORARY IRAQI STUDIES
INTERNATIONAL JOURNAL OF MARITIME HISTORY
INTERNATIONAL JOURNAL OF MARITIME HISTORY
INTERNATIONAL JOURNAL OF MARITIME HISTORY
INTERNATIONAL JOURNAL OF MARITIME HISTORY
INTERNATIONAL JOURNAL OF MIDDLE EAST STUDIES
INTERNATIONAL JOURNAL OF NAUTICAL ARCHAEOLOGY
INTERNATIONAL JOURNAL OF THE CLASSICAL TRADITION
INTERNATIONAL JOURNAL OF THE CLASSICAL TRADITION
INTERNATIONAL JOURNAL OF THE CLASSICAL TRADITION
As this is the last bit, the shell then - by default - prints the results to the shell, i.e. the number of articles published in 2009 in academic journals whose title contains the word ‘International’, with counts separated by journal.
There are a few false positives, but this is still a good start: from 500,000 lines of journal article metadata to a few numbers and names just by typing one line of code.
grep 2009 ~/desktop/libcarp-data-notes/data/2014-01_JA.tsv | grep INTERNATIONAL | awk -F'\t' '{print $5}' | sort | uniq -c
40 AFRICA -LONDON- INTERNATIONAL AFRICAN INSTITUTE-
1 ARCHAEOLOGY ETHNOLOGY AND ANTHROPOLOGY OF EURASIA
34 AUSTRALIAN JOURNAL OF INTERNATIONAL AFFAIRS
947 BAR INTERNATIONAL SERIES
105 CHINA REVIEW INTERNATIONAL
70 FOREIGN POLICY -WASHINGTON-
13 GESTA
38 INDONESIAN QUARTERLY
2 INTERNATIONAL AFRICAN LIBRARY
274 INTERNATIONAL HISTORY REVIEW
80 INTERNATIONAL JOURNAL OF AFRICAN HISTORICAL STUDIES
17 INTERNATIONAL JOURNAL OF AFRICAN RENAISSANCE STUDIES
23 INTERNATIONAL JOURNAL OF CONTEMPORARY IRAQI STUDIES
16 INTERNATIONAL JOURNAL OF HISTORICAL ARCHAEOLOGY
13 INTERNATIONAL JOURNAL OF IBERIAN STUDIES
217 INTERNATIONAL JOURNAL OF MARITIME HISTORY
169 INTERNATIONAL JOURNAL OF MIDDLE EAST STUDIES
95 INTERNATIONAL JOURNAL OF NAUTICAL ARCHAEOLOGY
70 INTERNATIONAL JOURNAL OF OSTEOARCHAEOLOGY
25 INTERNATIONAL JOURNAL OF REGIONAL AND LOCAL STUDIES
55 INTERNATIONAL JOURNAL OF THE CLASSICAL TRADITION
46 INTERNATIONAL REVIEW OF SOCIAL HISTORY
42 INTERNATIONALES ASIENFORUM
28 JEUNE AFRIQUE
1 JOURNAL OF AFRICAN AMERICAN HISTORY
31 JOURNAL OF CONTEMPORARY AFRICAN STUDIES
1 JOURNAL OF MODERN AFRICAN STUDIES
2 RESEARCH IN INTERNATIONAL STUDIES SOUTHEAST ASIA SERIES
1 REVOLUTIONARY RUSSIA
Point out this string of shell commands is an example of how powerful the shell can be to quickly find inforamtion in a large data set.
We will begin with the basics of navigating the Unix shell.
Windows start Gitbash
This is our command line, and the ##### $ is the 'command prompt' to show the system is ready for your input.
Try typing pwd and hitting enter.
pwd
/Users/rotsuji/desktop/libcarp-data-notes
The standard output displays your current location in the OS file system.
you see a list of every file and directory within your current location.
ls
03-foundations-slides.html LC-ShellNotes-2.ipynb
03-foundations-slides.md README.md
04-regex1-slides.html data
LC-ShellNotes-1.ipynb
Flags are additions to a command that provide the computer with a bit more guidance of what sort of output or manipulation you want.
If you type 'ls -l' and hit enter the computer:
ls -l
total 432
-rw-r--r-- 1 rotsuji staff 11408 Jul 16 23:08 03-foundations-slides.html
-rw-r--r-- 1 rotsuji staff 11366 Jul 16 23:08 03-foundations-slides.md
-rw-r--r-- 1 rotsuji staff 6537 Jul 16 23:08 04-regex1-slides.html
-rw-r--r--@ 1 rotsuji staff 178893 Jul 16 23:18 LC-ShellNotes-1.ipynb
-rw-r--r-- 1 rotsuji staff 3759 Jul 16 23:03 LC-ShellNotes-2.ipynb
-rw-r--r-- 1 rotsuji staff 107 Jul 16 23:08 README.md
drwxr-xr-x 11 rotsuji staff 374 Jul 16 23:13 data
E.g. the size of the files in bytes, the date it was created or last modified, and the file name.
when used with the '-l' flag, this adds the unit suffixes: Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the number of digits to three or less using base 2 for sizes.
By typing 'ls -lh' and hitting enter you receive an output in a human-readable format (note: that the order here doesn’t matter).
ls -lh
total 432
-rw-r--r-- 1 rotsuji staff 11K Jul 16 23:08 03-foundations-slides.html
-rw-r--r-- 1 rotsuji staff 11K Jul 16 23:08 03-foundations-slides.md
-rw-r--r-- 1 rotsuji staff 6.4K Jul 16 23:08 04-regex1-slides.html
-rw-r--r--@ 1 rotsuji staff 175K Jul 16 23:18 LC-ShellNotes-1.ipynb
-rw-r--r-- 1 rotsuji staff 3.7K Jul 16 23:03 LC-ShellNotes-2.ipynb
-rw-r--r-- 1 rotsuji staff 107B Jul 16 23:08 README.md
drwxr-xr-x 11 rotsuji staff 374B Jul 16 23:13 data
Use the man command to invoke the manual page (documentation) for a Shell command.
Now that You’ve now spent some time in your home directory Let’s go somewhere else. You can do browse through your directories by typing the 'cd' or 'Change Directory command' with the directory name. before you do that...
Start from our home directory
cd ~
pwd
/Users/rotsuji
cd desktop
pwd
/Users/rotsuji/desktop
cd ~
pwd
/Users/rotsuji
cd -
/Users/rotsuji/desktop
cd -
/Users/rotsuji
Try exploring: move around the computer, get used to moving in and out of directories, see how different file types appear in the Unix shell.
Being able to navigate your file system using the bash shell is very important for using the Unix shell effectively.
As you become more comfortable, you’ll soon find yourself skipping directly to the directory that you want.
After this basic introduction, within the Unix shell, you can now:
As well as navigating directories, you can interact with files on the command line: you can read them, open them, run them, and even edit them, often without ever having to leave the interface.
Sometimes it is easier to do this using a Graphical User Interface, such as Word or your normal explorer, but the more you work here, the more it is useful, and the more you write scripts, the more you’ll need this basic knowledge.
Here are a few basic ways to interact with files.
Go to your data folder
type pwd to show your current location
pwd
/Users/rotsuji
Browse to your data folder using the commands we just learned
cd desktop/libcarp-data-notes/data/
pwd
/Users/rotsuji/desktop/libcarp-data-notes/data
ls -F
2014-01-31_JA-africa.tsv* 2014-02-01_JA-art .tsv*
2014-01-31_JA-america.tsv* 2014-02-02_JA-britain.tsv*
2014-01_JA.tsv 829-0.txt*
2014-01_JA.tsv.zip gallic.txt*
mkdir firstdir
ls -F
2014-01-31_JA-africa.tsv* 2014-02-02_JA-britain.tsv*
2014-01-31_JA-america.tsv* 829-0.txt*
2014-01_JA.tsv firstdir/
2014-01_JA.tsv.zip gallic.txt*
2014-02-01_JA-art .tsv*
cd firstdir
pwd
/Users/rotsuji/desktop/libcarp-data-notes/data/firstdir
There’s a trick to make things a bit quicker. Go up one directory (cd ..). To navigate to the firstdir directory you could type cd firstdir. Alternatively, you could type cd f and then hit tab You will notice that the interface completes the line to cd firstdir. Hitting tab at any time within the shell will prompt it to attempt to auto-complete the line based on the files or sub-directories in the current directory. Where two or more files have the same characters, the auto-complete will only fill up to the first point of difference, after which you can add more characters, and try using tab again. We would encourage using this method throughout today to see how it behaves (as it saves loads of time and effort!).
The thing we are going to do is manipulate files using the shell.
cd /Users/rotsuji/desktop/libcarp-data-notes/data/
pwd
/Users/rotsuji/desktop/libcarp-data-notes/data
ls -lh
total 346224
-rwxr-xr-x 1 rotsuji staff 3.6M Jul 16 23:08 2014-01-31_JA-africa.tsv
-rwxr-xr-x 1 rotsuji staff 7.4M Jul 16 23:08 2014-01-31_JA-america.tsv
-rw-rw-r-- 1 rotsuji staff 125M Jun 10 2015 2014-01_JA.tsv
-rw-r--r-- 1 rotsuji staff 29M Jul 16 23:08 2014-01_JA.tsv.zip
-rwxr-xr-x 1 rotsuji staff 1.8M Jul 16 23:08 2014-02-01_JA-art .tsv
-rwxr-xr-x 1 rotsuji staff 1.4M Jul 16 23:08 2014-02-02_JA-britain.tsv
-rwxr-xr-x 1 rotsuji staff 598K Jul 16 23:08 829-0.txt
drwxr-xr-x 2 rotsuji staff 68B Jul 16 23:22 firstdir
-rwxr-xr-x 1 rotsuji staff 13B Jul 16 23:08 gallic.txt
There is a command utility build into the shell that lets you read files. * The command is the cat or concatinate command. * The basic command reads files sequentially, writing them to standard output.
cat 829-0.txt | head -50
The output is quite long. * To view the first several lines only we'll use the head command.
head 829-0.txt
tail 829-0.txt
Using head and tail is a good way to quickly determine the contents of the file.
Another way to navigate files is to view the contents one screen at a time.
Type less 829-0.txt to see the first screen, spacebar to see the next screen and so on, then q to quit (return to the command prompt).
less 829-0.txt (does not work in jupyter - check terminal on screen for output)
Now we want to change the file name to something more descriptive. * You can ‘move’ it to a new name by using the mv or move command. * To do this type mv 829-0.txt gulliver.txt and hit enter. * This is equivalent to the ‘rename file’ function.
mv 829-0.txt gulliver.txt
check file has been created by typing the command ls -F
ls -F
2014-01-31_JA-africa.tsv* 2014-02-02_JA-britain.tsv*
2014-01-31_JA-america.tsv* firstdir/
2014-01_JA.tsv gallic.txt*
2014-01_JA.tsv.zip gulliver.txt*
2014-02-01_JA-art .tsv*
Now, if you wanted to duplicate the file instead of renaming the file, you could use: * the cp or copy command * type cp gulliver.txt 829-0.txt
cp gulliver.txt 829-0.txt
ls -F
2014-01-31_JA-africa.tsv* 2014-02-02_JA-britain.tsv*
2014-01-31_JA-america.tsv* 829-0.txt*
2014-01_JA.tsv firstdir/
2014-01_JA.tsv.zip gallic.txt*
2014-02-01_JA-art .tsv* gulliver.txt*
For demo purposes to compare the .txt files * run cat command for both files.
cat 829-0.txt | head -10
cat gulliver.txt | head -10
Use cp to duplicate the gulliver.txt file and give it the filename gulliver-backup.txt: any ideas how you do that?
$cp gulliver.txt gulliver-backup.txt
cp gulliver.txt gulliver-backup.txt
ls -F
2014-01-31_JA-africa.tsv* 829-0.txt*
2014-01-31_JA-america.tsv* firstdir/
2014-01_JA.tsv gallic.txt*
2014-01_JA.tsv.zip gulliver-backup.txt*
2014-02-01_JA-art .tsv* gulliver.txt*
2014-02-02_JA-britain.tsv*
now that you have two copies of Gulliver’s Travels, we are going to put them together to make an even longer book.
To combine, or concatenate, two or more files use the cat command again. * Type cat gulliver.txt gulliver-backup.txt and press enter.
cat gulliver.txt gulliver-backup.txt | head -50
Use the up arrow to get to the last command * ammend the line to * cat gulliver.txt gulliver-backup.txt > gulliver-twice.txt and press enter (check terminal for output)
cat gulliver.txt gulliver-backup.txt > gulliver-twice.txt
Now, when you type ls -F you’ll see gulliver-twice.txt appear in your directory.
ls -F
2014-01-31_JA-africa.tsv* 829-0.txt*
2014-01-31_JA-america.tsv* firstdir/
2014-01_JA.tsv gallic.txt*
2014-01_JA.tsv.zip gulliver-backup.txt*
2014-02-01_JA-art .tsv* gulliver-twice.txt
2014-02-02_JA-britain.tsv* gulliver.txt*
When combining more than two files, using a wildcard can help avoid having to write out each filename individually.
pwd
/Users/rotsuji/desktop/libcarp-data-notes/data
ls -F
2014-01-31_JA-africa.tsv* 829-0.txt*
2014-01-31_JA-america.tsv* firstdir/
2014-01_JA.tsv gallic.txt*
2014-01_JA.tsv.zip gulliver-backup.txt*
2014-02-01_JA-art .tsv* gulliver-twice.txt
2014-02-02_JA-britain.tsv* gulliver.txt*
now, if you type **cat *.txt > everything-together.txt and hit enter, a combination of all the .txt files in the current directory are combined in alphabetical order as everything-together.txt**.
cat *.txt > everything-together.txt
ls -F
2014-01-31_JA-africa.tsv* everything-together.txt
2014-01-31_JA-america.tsv* firstdir/
2014-01_JA.tsv gallic.txt*
2014-01_JA.tsv.zip gulliver-backup.txt*
2014-02-01_JA-art .tsv* gulliver-twice.txt
2014-02-02_JA-britain.tsv* gulliver.txt*
829-0.txt*
cat everything-together.txt | head -50
Another wildcard worth remembering is ? which is a place holder for a single character or number. * Will not cover this wildcard now * We shall return to shell wildcards later - for now, note again that they are similar to but not excatly the same as the Regex we saw in the previous module.
Finally, onto deleting. * We won’t use it now, but if you do want to delete a file, for whatever reason, the command is rm, or remove.
Be careful with the rm command, as you don’t want to delete files that you do not mean to. * Unlike deleting from within your Graphical User Interface, there is no recycling bin or undo options. * For that reason, if you are in doubt, you may want to exercise caution or maintain a regular backup of your data.
The syntax for rm is the same as cp and mv: for example rm gulliver.txt
Within the Unix shell you can now: