grep (global regluar expression print) searches across multiple files for specific character strings> operator is a powerful data mining tool searching for patterns across multiple files, subsetting into new derived filesStart in the data/ directory
pwdShould be something like /Users/riley/Desktop/data
Let's look across all the .tsv files for instances of 1999 (character cluster)
grep 1999 *.tsvgrep -c 1999 *.tsv1999 appeared in each *.tsvthe Year in this instance corresponds to the date field for each journal article in the file ( look at the file )
Srings need not be numbers
grep -c revolution *.tsvcounts the instances of the string revolution with the defined files and prints those counts to the shell
now let's add a -i flag to our previous command
grep -ci revolution *.tsvrevolution and Revolution)results/ to save the workgrep -ci revolution *.tsv > results/2016-07-19_JA-revolution.txtgrep comes in that you can use it to create subsets of tabulated data (or any data) from one or multiple filesgrep -i revolution *.tsvrevolution without regard for casegrep -i revolution *.tsv > results/2016-07-19_JAi-revolution.txtgrep provides the w flag so we can look for whole words - greater precision in our searchgrep -iw revolution *.tsv > results/2016-07-19_JAiw-revolution.tsvthis now looks in the files and exports any lines containing the whole world revolution
we can now show the differences between the files we created
wc -l results/*.tsvWe can use the regular expression syntax covered earlier to search for similar words
gallic.txtcat gallic.txtfr[ae]nc[eh] -- we learned this yesterdaygrepgrep -iw --file=gallic.text *.tsv- france
- french
- frence
- franch
-o flag, we will print only the matching part of the lines, e.g. (this is handy for isolating/checking results)grep -iwo revolution *.tsvOR:
grep -iwo --file=gallic.txt *.tsvSearch for all case sensitive instances of a word you choose in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to the shell.
Count all case sensitive instances of a word you choose in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to the shell.
Count all case insensitive instances of that word in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to the shell.
Search for all case insensitive instances of that word in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to a new >.tsv file.
Search for all case insensitive instances of that whole word in the ‘America’ and ‘Africa’ tsv files in this directory. Print your results to a new .tsv >file.
Solution >~ >grep -iw hero 2014-01-31* > new2.tsv >~ >{: .bash}