by rupe

How do I read a huge file line by line in Python, without loading the entire thing into memory first?

In Python, the most common way to read lines from a file is to do the following:

for line in open('myfile','r').readlines():
do_something(line)


When this is done, however, the readlines() function loads the entire file into memory as it runs. A better approach for large files is to use the fileinput module, as follows:

import fileinput
for line in fileinput.input(['myfile']):
do_something(line)


the fileinput.input() call reads lines sequentially, but doesn't keep them in memory after they've been read.

 


Annotation by enki :
as 'file' is iteratable, why not simply iterate on it? (use iter() when you need more control over the iterator's state)

ex:

for line in open('myfile','r'):
    doSomething(line)

 


 
Read more of   The Yak's Frequently Questioned Answers   (mod.2008-11-08)

413.   How can I solve Cisco Catalyst 3550 series issues where a port seems to have died?   [jake/2004-08-23]
321.   what is jesse's first rule of backpacking?   [jesse/2002-08-13]
307.   Where the hell is #yak?   [vonguard/2002-02-07]
265.   Who is Brad   [brad/2002-07-16]
221.   Where can I find information about odd places in San Francisco?   [rupe/2001-06-04]
157.   In the intro to the Everclear song "A.M. Radio", what does the reference to KHJ Radio, Los Angeles refer to?   [rupe/2001-01-01]
92.   How many characters wide is the display on a Nokia 5190?   [rupe/2000-05-10]
61.   Where is strick?   [strick/2001-05-30]
40.   What's the funniest comic on the net?   [strick/2000-02-02] ( strick/2001-06-11 robey/2001-06-08 strick/2000-10-30 )
35.   What are HAW FLAKES?   [strick/2002-11-14]