We wrote a map reduce program in Python using Hadoop Streaming in this post. I recently got a request from a friend on how files are read in Python. So, this post is dedicated to that request. For all the work in this post, we will be using CLI of Python 3.7.0(the latest version of Python as of the time this blog is being written)
The version is shown below:
F:\PythonPrograms\files>python --version
Python 3.7.0
Once we invoke python we can start entering commands as show below:
F:\PythonPrograms\files>python
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
We will be using a file called sample.txt at F:\PythonPrograms\files\ having below content:
Python is a programming language that lets you work quickly and integrate systems more effectively.
Reading this file comprises three operations: opening the file, reading the file, and closing the file. We can open the file using below command:
f = open('sample_file.txt')
To read the file, we use below command:
f.read()
Finally, we have to close the file object that we have used to release any resources used by the system:
f.close()
The outputs are shown below:
>>> f = open('sample_file.txt')
>>> f.read()
'Python is a programming language that lets you work quickly and integrate systems more effectively.'
>>> f.close()
>>>
open function takes another argument called mode. The default value for this argument is r that stands for read only. The mode can either be a single or two or three characters. The third character is always a +. The details about the characters are given below:
1) r : read. File is opened in read only mode. If no file exists, FileNotFoundError is thrown. If file exists, the file pointer points to beginning of file
2) w : write. File is opened for writing. If no file exists, a new file is created for writing, else, if file exists, then, existing contents are overwritten
3) a : append. File is opened for appending. If no file exists, a new file is created for appending, else, if file exists, then, the file pointer points to end of file
4) + : This is not an independent mode. + is used in combination with any one of three options: r, w or a. If the file is opened in read only mode, then, adding + will add the write option as well. If the file is opened in write/append only mode, then, adding + will add the read option as well.
5) b : binary format. Like +, this is not an independent mode. Used in combination with any one of r, w or a
Apart from r, w, and a, we can have combinations containing one of r, w, or a with b and/or + as r+, rb, rb+, w+, wb, wb+, a+, ab, ab+. The meanings of these combinations are explained from points 1 to 5 above
The file object returned by open function has below properties:
f.name returns file name
f.mode returns mode
f.closed returns boolean indicating whether file is open or closed
f.isatty() returns boolean True if file if stream is interactive
f.fileno() returns an descriptor defined in the environment as in integer corresponding to the file
f.readable() returns boolean True is stream is capable of being read
f.seekable() returns boolean True is stream allows random access
f.writable() returns boolean True is stream is capable of being written to
The usage is shown below:
>>> f = open('sample_file.txt')
>>> f.name
'sample_file.txt'
>>> f.mode
'r'
>>> f.closed
False
>>> f.isatty()
False
>>> f.fileno()
3
>>> f.readable()
True
>>> f.seekable()
True
>>> f.writable()
False
>>> f.close()
>>>
We can also use absolute file path as shown below:
>>> with open('F:\\PythonPrograms\\files\\sample_file.txt', 'r') as f:
... f.read()
...
'Python is a programming language that lets you work quickly and integrate systems more effectively.'
>>> f.closed
True
>>>
We have used with to open a file. If we use with, then, we do not need to explicitly close a file using close(). We can also use for loop. But, note that we have to explicitly close the file as shown below:
>>> f = open('sample_file.txt')
>>> for line in f:
... print(line)
...
Python is a programming language that lets you work quickly and integrate systems more effectively.
>>> f.closed
False
>>> f.close()
>>>
Let us use another file called blue_carbuncle.txt to read to show another usage of for loop. The contents of blue_carbuncle.txt are shown below:
1. THE ADVENTURE OF THE BLUE CARBUNCLE
2.
3. I had called upon my friend Sherlock Holmes upon the second
4. morning after Christmas, with the intention of wishing him the
5. compliments of the season.
Running the for loop returns below result:
>>> f = open('blue_carbuncle.txt')
>>> for line in f:
... print(line)
...
1. THE ADVENTURE OF THE BLUE CARBUNCLE
2.
3. I had called upon my friend Sherlock Holmes upon the second
4. morning after Christmas, with the intention of wishing him the
5. compliments of the season.
>>> f.close()
>>>
Using end ='' in print argument returns output after removing all newline characters:
>>> f = open('blue_carbuncle.txt')
>>> for line in f:
... print(line, end='')
...
1. THE ADVENTURE OF THE BLUE CARBUNCLE
2.
3. I had called upon my friend Sherlock Holmes upon the second
4. morning after Christmas, with the intention of wishing him the
5. compliments of the season.>>>
>>> f.close()
>>>
To read all the contents of the file into a single element in a list, we can use readlines():
>>> f = open('sample_file.txt')
>>> f.readlines()
['Python is a programming language that lets you work quickly and integrate systems more effectively.']
>>> f.close()
>>>
The same output is seen when using below code:
>>> f = open('sample_file.txt')
>>> file_in_list = list(f)
>>> f.close()
>>> print(file_in_list)
['Python is a programming language that lets you work quickly and integrate systems more effectively.']
>>>
To read a file line by line, we can use readline(). Below code uses readline() just once. So, only the first line of file is returned:
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.readline()
...
'1. THE ADVENTURE OF THE BLUE CARBUNCLE\n'
>>>
If we need to read more lines, we can call as many times as we need:
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.readline()
... f.readline()
... f.readline()
...
'1. THE ADVENTURE OF THE BLUE CARBUNCLE\n'
'2.\n'
'3. I had called upon my friend Sherlock Holmes upon the second\n'
>>>
We can specify the number of bytes to be read as an argument to readline():
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.readline()
... f.readline()
... f.readline(15)
...
'1. THE ADVENTURE OF THE BLUE CARBUNCLE\n'
'2.\n'
'3. I had called'
>>>
We can also set the number of bytes to be read using read() by setting optional size argument as shown below:
>>> with open('blue_carbuncle.txt', 'rb') as f:
... f.read(16)
...
b'1. THE ADVENTURE'
>>>
The read mode above is binary. In normal mode also, in this case, the output is same:
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.read(16)
...
'1. THE ADVENTURE'
>>>
tell() returns an integer corresponding to the position of the file pointer till where the file has been read in bytes in binary mode
>>> with open('blue_carbuncle.txt', 'rb') as f:
... f.read(16)
... f.tell()
...
b'1. THE ADVENTURE'
16
>>>
If we wish to read from a particular position in the content of file, we can use f.seek(offset, whence) such that any content before offset is ignored. The offset is set from whence parameter.
If whence is set to 0, then, the offset is set from the beginning of the file
If whence is set to 1, then, the offset is set the position of the file pointer in the file
If whence is set to 2, then, the offset is set from the end of the file
The version is shown below:
F:\PythonPrograms\files>python --version
Python 3.7.0
Once we invoke python we can start entering commands as show below:
F:\PythonPrograms\files>python
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
We will be using a file called sample.txt at F:\PythonPrograms\files\ having below content:
Python is a programming language that lets you work quickly and integrate systems more effectively.
Reading this file comprises three operations: opening the file, reading the file, and closing the file. We can open the file using below command:
f = open('sample_file.txt')
To read the file, we use below command:
f.read()
Finally, we have to close the file object that we have used to release any resources used by the system:
f.close()
The outputs are shown below:
>>> f = open('sample_file.txt')
>>> f.read()
'Python is a programming language that lets you work quickly and integrate systems more effectively.'
>>> f.close()
>>>
open function takes another argument called mode. The default value for this argument is r that stands for read only. The mode can either be a single or two or three characters. The third character is always a +. The details about the characters are given below:
1) r : read. File is opened in read only mode. If no file exists, FileNotFoundError is thrown. If file exists, the file pointer points to beginning of file
2) w : write. File is opened for writing. If no file exists, a new file is created for writing, else, if file exists, then, existing contents are overwritten
3) a : append. File is opened for appending. If no file exists, a new file is created for appending, else, if file exists, then, the file pointer points to end of file
4) + : This is not an independent mode. + is used in combination with any one of three options: r, w or a. If the file is opened in read only mode, then, adding + will add the write option as well. If the file is opened in write/append only mode, then, adding + will add the read option as well.
5) b : binary format. Like +, this is not an independent mode. Used in combination with any one of r, w or a
Apart from r, w, and a, we can have combinations containing one of r, w, or a with b and/or + as r+, rb, rb+, w+, wb, wb+, a+, ab, ab+. The meanings of these combinations are explained from points 1 to 5 above
The file object returned by open function has below properties:
f.name returns file name
f.mode returns mode
f.closed returns boolean indicating whether file is open or closed
f.isatty() returns boolean True if file if stream is interactive
f.fileno() returns an descriptor defined in the environment as in integer corresponding to the file
f.readable() returns boolean True is stream is capable of being read
f.seekable() returns boolean True is stream allows random access
f.writable() returns boolean True is stream is capable of being written to
The usage is shown below:
>>> f = open('sample_file.txt')
>>> f.name
'sample_file.txt'
>>> f.mode
'r'
>>> f.closed
False
>>> f.isatty()
False
>>> f.fileno()
3
>>> f.readable()
True
>>> f.seekable()
True
>>> f.writable()
False
>>> f.close()
>>>
We can also use absolute file path as shown below:
>>> with open('F:\\PythonPrograms\\files\\sample_file.txt', 'r') as f:
... f.read()
...
'Python is a programming language that lets you work quickly and integrate systems more effectively.'
>>> f.closed
True
>>>
We have used with to open a file. If we use with, then, we do not need to explicitly close a file using close(). We can also use for loop. But, note that we have to explicitly close the file as shown below:
>>> f = open('sample_file.txt')
>>> for line in f:
... print(line)
...
Python is a programming language that lets you work quickly and integrate systems more effectively.
>>> f.closed
False
>>> f.close()
>>>
Let us use another file called blue_carbuncle.txt to read to show another usage of for loop. The contents of blue_carbuncle.txt are shown below:
1. THE ADVENTURE OF THE BLUE CARBUNCLE
2.
3. I had called upon my friend Sherlock Holmes upon the second
4. morning after Christmas, with the intention of wishing him the
5. compliments of the season.
Running the for loop returns below result:
>>> f = open('blue_carbuncle.txt')
>>> for line in f:
... print(line)
...
1. THE ADVENTURE OF THE BLUE CARBUNCLE
2.
3. I had called upon my friend Sherlock Holmes upon the second
4. morning after Christmas, with the intention of wishing him the
5. compliments of the season.
>>> f.close()
>>>
Using end ='' in print argument returns output after removing all newline characters:
>>> f = open('blue_carbuncle.txt')
>>> for line in f:
... print(line, end='')
...
1. THE ADVENTURE OF THE BLUE CARBUNCLE
2.
3. I had called upon my friend Sherlock Holmes upon the second
4. morning after Christmas, with the intention of wishing him the
5. compliments of the season.>>>
>>> f.close()
>>>
To read all the contents of the file into a single element in a list, we can use readlines():
>>> f = open('sample_file.txt')
>>> f.readlines()
['Python is a programming language that lets you work quickly and integrate systems more effectively.']
>>> f.close()
>>>
The same output is seen when using below code:
>>> f = open('sample_file.txt')
>>> file_in_list = list(f)
>>> f.close()
>>> print(file_in_list)
['Python is a programming language that lets you work quickly and integrate systems more effectively.']
>>>
To read a file line by line, we can use readline(). Below code uses readline() just once. So, only the first line of file is returned:
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.readline()
...
'1. THE ADVENTURE OF THE BLUE CARBUNCLE\n'
>>>
If we need to read more lines, we can call as many times as we need:
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.readline()
... f.readline()
... f.readline()
...
'1. THE ADVENTURE OF THE BLUE CARBUNCLE\n'
'2.\n'
'3. I had called upon my friend Sherlock Holmes upon the second\n'
>>>
We can specify the number of bytes to be read as an argument to readline():
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.readline()
... f.readline()
... f.readline(15)
...
'1. THE ADVENTURE OF THE BLUE CARBUNCLE\n'
'2.\n'
'3. I had called'
>>>
We can also set the number of bytes to be read using read() by setting optional size argument as shown below:
>>> with open('blue_carbuncle.txt', 'rb') as f:
... f.read(16)
...
b'1. THE ADVENTURE'
>>>
The read mode above is binary. In normal mode also, in this case, the output is same:
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.read(16)
...
'1. THE ADVENTURE'
>>>
tell() returns an integer corresponding to the position of the file pointer till where the file has been read in bytes in binary mode
>>> with open('blue_carbuncle.txt', 'rb') as f:
... f.read(16)
... f.tell()
...
b'1. THE ADVENTURE'
16
>>>
If we wish to read from a particular position in the content of file, we can use f.seek(offset, whence) such that any content before offset is ignored. The offset is set from whence parameter.
If whence is set to 0, then, the offset is set from the beginning of the file
If whence is set to 1, then, the offset is set the position of the file pointer in the file
If whence is set to 2, then, the offset is set from the end of the file
Let us see a few examples now:
>>> with open('blue_carbuncle.txt', 'rb') as f:
... f.read(16)
... f.tell()
... f.read(16)
... f.tell()
...
b'1. THE ADVENTURE'
16
b' OF THE BLUE CAR'
32
>>>
Let us now add seek(0) and reset the file pointer to the start of the file:
>>> with open('blue_carbuncle.txt', 'rb') as f:
... f.read(16)
... f.tell()
... f.seek(0)
... f.tell()
... f.read(16)
... f.tell()
...
b'1. THE ADVENTURE'
16
0
0
b'1. THE ADVENTURE'
16
>>>
seek(0) returns the position of the file pointer. This is also confirmed by tell(). In the next example, we set whence to 1, and set the offset to 1. But, if the read mode is not in binary format, we get an error:
>>> with open('blue_carbuncle.txt', 'r') as f:
... f.read(16)
... f.tell()
... f.seek(1,1)
... f.tell()
... f.read(16)
... f.tell()
...
'1. THE ADVENTURE'
16
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
io.UnsupportedOperation: can't do nonzero cur-relative seeks
>>>
Let us not set the mode to 'rb':
>>> with open('blue_carbuncle.txt', 'rb') as f:
... f.read(16)
... f.tell()
... f.seek(1,1)
... f.tell()
... f.read(16)
... f.tell()
...
b'1. THE ADVENTURE'
16
17
17
b'OF THE BLUE CARB'
33
>>>
The offset of 1 skips the space. In the last example on seek, we set whence to 2:
>>> with open('blue_carbuncle.txt', 'rb') as f:
... f.seek(-7,2)
... f.tell()
... f.read(6)
... f.tell()
...
197
197
b'season'
203
>>>
Using truncate we can resize the file. Let us truncate blue_carbuncle.txt file to a size of 91 bytes in content as shown below:
>>> f = open('blue_carbuncle.txt','rb+')
>>> f.truncate(91)
91
>>> f.close()
>>>
After truncate, the content of blue_carbuncle.txt is shown below:
1. THE ADVENTURE OF THE BLUE CARBUNCLE
2.
3. I had called upon my friend Sherlock Holmes
Let us restore blue_carbuncle.txt with origianl content. If no argument is specified, then, the file is resized till current position a shown below:
>>> f = open('blue_carbuncle.txt','rb+')
>>> f.read(91)
b'1. THE ADVENTURE OF THE BLUE CARBUNCLE\r\n2.\r\n3. I had called upon my friend Sherlock Holmes '
>>> f.truncate()
91
>>> f.close()
>>>
After truncate, the content of blue_carbuncle.txt is shown below:
1. THE ADVENTURE OF THE BLUE CARBUNCLE
2.
3. I had called upon my friend Sherlock Holmes
This is in line with our expectation. Lastly, we explore write. Let us create a new file for writing as follows:
>>> f = open('sample_file_new.txt','wb')
>>> f.write(b'Python is a programming language that lets you work quickly and integrate systems more effectively.')
99
>>> f.close()
>>>
The contents of above file are shown below:
Python is a programming language that lets you work quickly and integrate systems more effectively.
Let us append one more sentence to the same file using mode 'a':
>>> f = open('sample_file_new.txt','ab')
>>> f.write(b'\r\nPython is a programming language that lets you work quickly and integrate systems more effectively.')
101
>>> f.close()
>>>
The contents of above file are shown below:
Python is a programming language that lets you work quickly and integrate systems more effectively.
Python is a programming language that lets you work quickly and integrate systems more effectively.
Let us now rewrite this file with mode 'w':
>>> f = open('sample_file_new.txt','wb')
>>> f.write(b'Python is a programming language')
32
>>> f.close()
Now, the contents of sample_file_new.txt are:
Python is a programming language
This concludes our discussion on reading files in Python