10. Multiple file input
10. Multiple file input ๊ด๋ จ
You have already seen blocks like BEGIN
, END
and statements like next
. This chapter will discuss features that are useful to make decisions around each file when there are multiple files passed as input.
Info
The example_files directory has all the files used in the examples.
BEGINFILE
, ENDFILE
and FILENAME
BEGINFILE
: this block gets executed before the start of each input fileENDFILE
: this block gets executed after processing each input fileFILENAME
: special variable having the filename of the current input file
Here are some examples:
can also use:
awk 'BEGINFILE{printf "--- %s ---\n", FILENAME} 1'
awk 'BEGINFILE{print "--- " FILENAME " ---"} 1' greeting.txt table.txt
# --- greeting.txt ---
# Hi there
# Have a nice day
# Good bye
# --- table.txt ---
# brown bread mat hair 42
# blue cake mug shirt -7
# yellow banana window shoes 3.14
same as:
tail -q -n1 greeting.txt table.txt
awk 'ENDFILE{print $0}' greeting.txt table.txt
# Good bye
# yellow banana window shoes 3.14
# nextfile
nextfile
The nextfile
statement helps to skip the remaining records from the current file being processed and move on to the next file. Note that the ENDFILE
block will still be executed, if present.
print filename
if it contains 'I
' anywhere in the file
same as:
grep -l 'I' f[1-3].txt greeting.txt
awk '/I/{print FILENAME; nextfile}' f[1-3].txt greeting.txt
# f1.txt
# f2.txt
print filename
if it contains both 'o
' and 'at
' anywhere in the file
awk 'BEGINFILE{m1=m2=0} /o/{m1=1} /at/{m2=1}
m1 && m2{print FILENAME; nextfile}' f[1-3].txt greeting.txt
# f2.txt
# f3.txt
print filename
if it contains 'at
' but not 'o
'
awk 'BEGINFILE{m1=m2=0} /o/{m1=1; nextfile} /at/{m2=1}
ENDFILE{if(!m1 && m2) print FILENAME}' f[1-3].txt greeting.txt
# f1.txt
Warning
nextfile
cannot be used in the BEGIN
or END
or ENDFILE
blocks. See gawk manual: nextfile for more details, how it affects ENDFILE
and other special cases.
ARGC
and ARGV
The ARGC
special variable contains the total number of arguments passed to the awk
command, including awk
itself as an argument. The ARGV
special array contains the arguments themselves.
note that the index starts with '0' here
awk 'BEGIN{for(i=0; i<ARGC; i++) print ARGV[i]}' f[1-3].txt greeting.txt
# awk
# f1.txt
# f2.txt
# f3.txt
# greeting.txt
Similar to manipulating NF
and modifying $N
field contents, you can change the values of ARGC
and ARGV
to control how the arguments should be processed.
However, not all arguments are necessarily filenames. awk
allows assigning variable values without -v
option if it is done in the place where you usually provide file arguments. For example:
awk 'BEGIN{for(i=0; i<ARGC; i++) print ARGV[i]}' table.txt n=5 greeting.txt
# awk
# table.txt
# n=5
# greeting.txt
In the above example, the variable n
will get a value of 5
after awk
has finished processing the table.txt
file. Here's an example where FS
is changed between two files.
cat table.txt
# brown bread mat hair 42
# blue cake mug shirt -7
# yellow banana window shoes 3.14
cat books.csv
# Harry Potter,Mistborn,To Kill a Mocking Bird
# Matilda,Castle Hangnail,Jane Eyre
for table.txt
, FS
will be the default value for books.csv
, FS
will be the comma character OFS
is comma for both the files
awk -v OFS=, 'NF=2' table.txt FS=, books.csv
# brown,bread
# blue,cake
# yellow,banana
# Harry Potter,Mistborn
# Matilda,Castle Hangnail
Info
See stackoverflow: extract positions 2-7 from a fasta sequence for a practical example of changing field/record separators between the files being processed.
Summary
This chapter introduced few more special blocks and variables are that handy for processing multiple file inputs. These will show up in examples in the coming chapters as well.
Next chapter will discuss use cases where you need to take decisions based on multiple input records.
Exercises
Info
The exercises directory has all the files used in this section.
Exercise 1
Print the last field of the first two lines for the input files table.txt
, scores.csv
and fw.txt
. The field separators for these files are space, comma and fixed width respectively. To make the output more informative, print filenames and a separator as shown in the output below. Assume that the input files will have at least two lines.
awk ##### add your solution here
# >table.txt<
# 42
# -7
# ----------
# >scores.csv<
# Chemistry
# 99
# ----------
# >fw.txt<
# 0.134563
# 6
# ----------
awk 'BEGINFILE{print ">" FILENAME "<"} {print $NF} FNR==2{print "----------";
nextfile}' table.txt FS=, scores.csv FIELDWIDTHS='14 *' fw.txt
# >table.txt<
# 42
# -7
# ----------
# >scores.csv<
# Chemistry
# 99
# ----------
# >fw.txt<
# 0.134563
# 6
# ----------
Exercise 2
For the input files sample.txt
, secrets.txt
, addr.txt
and table.txt
, display only the names of files that contain at or fun in the third field. Assume space as the field separator.
awk ##### add your solution here sample.txt secrets.txt addr.txt table.txt
# secrets.txt
# addr.txt
awk '$3 ~ /fun|at/{print FILENAME; nextfile}' sample.txt secrets.txt addr.txt table.txt
# secrets.txt
# addr.txt
# table.txt