grep
(global regluar expression print) searches across multiple files for specific character strings>
operator is a powerful data mining tool searching for patterns across multiple files, subsetting into new derived filesStart in the data/
directory
pwd
Should be something like /Users/riley/Desktop/data
Let's look across all the .tsv files for instances of 1999 (character cluster)
grep 1999 *.tsv
grep -c 1999 *.tsv
1999
appeared in each *.tsvthe Year in this instance corresponds to the date field for each journal article in the file ( look at the file )
Srings need not be numbers
grep -c revolution *.tsv
counts the instances of the string revolution
with the defined files and prints those counts to the shell
now let's add a -i
flag to our previous command
grep -ci revolution *.tsv
revolution
and Revolution
)results/
to save the workgrep -ci revolution *.tsv > results/2016-07-19_JA-revolution.txt
grep
comes in that you can use it to create subsets of tabulated data (or any data) from one or multiple filesgrep -i revolution *.tsv
revolution
without regard for casegrep -i revolution *.tsv > results/2016-07-19_JAi-revolution.txt
grep
provides the w
flag so we can look for whole words - greater precision in our searchgrep -iw revolution *.tsv > results/2016-07-19_JAiw-revolution.tsv
this now looks in the files and exports any lines containing the whole world revolution
we can now show the differences between the files we created
wc -l results/*.tsv
We can use the regular expression syntax covered earlier to search for similar words
gallic.txt
cat gallic.txt
fr[ae]nc[eh]
-- we learned this yesterdaygrep
grep -iw --file=gallic.text *.tsv
- france
- french
- frence
- franch
-o
flag, we will print only the matching part of the lines, e.g. (this is handy for isolating/checking results)grep -iwo revolution *.tsv
OR:
grep -iwo --file=gallic.txt *.tsv
Search for all case sensitive instances of a word you choose in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to the shell.
Count all case sensitive instances of a word you choose in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to the shell.
Count all case insensitive instances of that word in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to the shell.
Search for all case insensitive instances of that word in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to a new >.tsv file.
Search for all case insensitive instances of that whole word in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to a new .tsv >file.
Solution >~ >grep -iw hero 2014-01-31* > new2.tsv >~ >{: .bash}