Python readlines returns an empty string

Wictorian · July 14, 2022

This happens on the initial call.

    f=open("asal_db.txt", "a+")
    imp = f.readlines()

this code returns an empty string. The file is not empty. How is this even possible?

tikker · July 14, 2022

Probably because you are opening it in append mode, which starts at the end of the file.

https://stackoverflow.com/questions/1466000/difference-between-modes-a-a-w-w-and-r-in-built-in-open-function

Quote

``a+'' Open for reading and writing. The file is created if it does not
         exist. The stream is positioned at the end of the file. Subse-
         quent writes to the file will always end up at the then current
         end of file, irrespective of any intervening fseek(3) or similar.

If you want to read it, open it in the "r" mode.

C2dan88 · July 14, 2022

Because with a+ the file is opened for reading and writing but the pointer is placed at the end of the file. Calling f.readlines() will not return anything, even if the file contains data.

With r+ mode the pointer is placed at the beginning of the file. Calling f.readlines() will return all the data in the file.

mariushm · July 14, 2022

Some observations:

ENTER can be made up of TWO bytes / TWO characters.

In Windows, ENTER is CR LF or \r \n -- CR is short for CARRIAGE RETURN (move cursor to beginning of line), LF is short for LINE FEED (advance one line).

In Linux, ENTER is just LF or \n - basically Linux assumes that if the document says advance one line, you also mean to "snap" the write position to the left, start of line.

In very old Macs, they used to use only CR / \r (OS9 and before), and after that they tend to use the Linux convention ( LF \n )

When you use .readline() to read a single line, by default Python will look for ENTER in whatever form it could be, and if there's both characters CR and LF ( \r \n ) Python will remove the \r and put only \n at the end of the string.

When you use writeline() , Python will read a constant that varies depending on operating system os.linesep and replace that \n at the end with \r \n on \n

So for example, if you open a text file created in Windows and read it line by line using readline and write these lines directly to another file, you may not get an identical copy of the original file, because Python has stripped those CR (\r) characters from the end of that line you read.

If you open a text file made in Windows which uses \r\n but you run Python under Linux, then writeline will write \n. If you run Python under Windows, in Windows the os.linesep will be \r\n so Python writeline will replace \n at the end with \r\n

You can tell Python to only consider some combinations as new line when you use the open function, see https://docs.python.org/3/library/functions.html#open

open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

PS... all this assumes the text file is written in a regular codepage or UTF8 - if the file is UTF-16 or UTF-32, you're screwed, as each character uses multiple bytes in those encodings .... but now we're getting into messy stuff.

Wictorian · July 15, 2022

15 hours ago, mariushm said:

Some observations:

ENTER can be made up of TWO bytes / TWO characters.

In Windows, ENTER is CR LF or \r \n -- CR is short for CARRIAGE RETURN (move cursor to beginning of line), LF is short for LINE FEED (advance one line).

In Linux, ENTER is just LF or \n - basically Linux assumes that if the document says advance one line, you also mean to "snap" the write position to the left, start of line.

In very old Macs, they used to use only CR / \r (OS9 and before), and after that they tend to use the Linux convention ( LF \n )

When you use .readline() to read a single line, by default Python will look for ENTER in whatever form it could be, and if there's both characters CR and LF ( \r \n ) Python will remove the \r and put only \n at the end of the string.

When you use writeline() , Python will read a constant that varies depending on operating system os.linesep and replace that \n at the end with \r \n on \n

So for example, if you open a text file created in Windows and read it line by line using readline and write these lines directly to another file, you may not get an identical copy of the original file, because Python has stripped those CR (\r) characters from the end of that line you read.

If you open a text file made in Windows which uses \r\n but you run Python under Linux, then writeline will write \n. If you run Python under Windows, in Windows the os.linesep will be \r\n so Python writeline will replace \n at the end with \r\n

You can tell Python to only consider some combinations as new line when you use the open function, see https://docs.python.org/3/library/functions.html#open

open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

PS... all this assumes the text file is written in a regular codepage or UTF8 - if the file is UTF-16 or UTF-32, you're screwed, as each character uses multiple bytes in those encodings .... but now we're getting into messy stuff.

the files are written by python

Wictorian · July 15, 2022

19 hours ago, tikker said:

Probably because you are opening it in append mode, which starts at the end of the file.

https://stackoverflow.com/questions/1466000/difference-between-modes-a-a-w-w-and-r-in-built-in-open-function

If you want to read it, open it in the "r" mode.

I want to both read it and append to it.

Wictorian · July 15, 2022

18 hours ago, C2dan88 said:

Because with a+ the file is opened for reading and writing but the pointer is placed at the end of the file. Calling f.readlines() will not return anything, even if the file contains data.

With r+ mode the pointer is placed at the beginning of the file. Calling f.readlines() will return all the data in the file.

Ok but since I want to both read it and append to it I felt like this way grants better optimization.

tikker · July 15, 2022

1 hour ago, Wictorian said:

I want to both read it and append to it.

Then you can use r+ as mentioned above and first read everything, followed by appending what you want to append.

mariushm · July 15, 2022

You have a couple options

open the file for reading

read all the lines in memory

close the file

do something with the lines

open the file for appending (which sets the file pointer to the end of the file)

write your new lines

close file

The downside of this method is that you have to store ALL lines in memory. If your text file is 100 MB, your Python program will use 100 MB+ of memory storing millions of lines of text in memory.

The other option is to not use readline or readlines, and determine where each line ends by yourself.

Open the file for read / write. Determine the file size and store that somewhere.

Your function will read data from the file until it reaches the file size value stored when you opened the file. You don't want to read until the end of file, because you may have appended lines to the end of the file.

You will have two offsets offset_read which will be initially 0, because you start reading from the beginning of file, and an offset_write which will be equal to file size, because when you write, you want to seek to the end of the file . Or you can simply use seek with the parameter whence set to end of file ex seek (0,2) means seek to 0 bytes before the end of file .. see https://www.tutorialspoint.com/python/file_seek.htm

Each time you try to read a line, you go byte by byte until you find the CR LF or LF sequence, and that's your line of text

Here's a crude example in PHP ... it reads a text file line by line, but tries to hold less than 512 characters in memory at any time and after every line read, it goes to the end of the file and appends the read line to the text file , basically doubling it.

$buffer = ''; // buffer to hold a small amount of data from the file at a time
$offset = 0;  // offset in the file
$fsize  = 0;  // file size 

function fill_buffer() {
	global $h,$offset,$buffer,$fsize;
	if ($offset >= $fsize) return 0; // if we reached the original end of file, we can't refill the buffer, nothing to fill with
	$max_size = $fsize-$offset;
	if ($max_size>512) $max_size = 512;
	echo "reading $max_size bytes.\n";
	// each time we read more data from file, we have to set our read pointer there, because we don't know 
	// if an append operation changed it or not
	fseek($h,$offset,SEEK_SET); 
	$read = fread($h,$max_size);
	$offset += strlen($read);
	$buffer .= $read;
	return strlen($read);
}

function read_line() {
	global $buffer;
	// is there a line terminator in the existing buffer? If not, it may be a longer line or the file may not end with a line ending
	$position = strpos($buffer,"\n");
	if ($position===FALSE) {
		$continue = TRUE;
		while ($continue==true) {
			$bytes_read = fill_buffer();
			if ($bytes_read==0) {
				$continue=FALSE; // we reached the end of file, so stop trying to read more data
			} else {
				// is there a new line terminator in the new buffer? If so, stop reading more than needed.
				$position = strpos($buffer,"\n");
				if ($position!==FALSE) $continue=FALSE;
			}
		}
	}
	if ($position!==FALSE) {
		// return only characters up to \n
		$line = substr($buffer,0,$position+1);
		$buffer = substr($buffer,$position+1);
	} else {
		// return the whole buffer, because we tried to fill buffer until a \n was found but had no luck and we reached end of file
		$line = $buffer;
		$buffer = '';
	}
	return $line;
}


$h = fopen('c:/temp/test.txt','r+');	// open file in read/write mode
$fileinfo = fstat($h);
$fsize = $fileinfo['size'];
echo "File size: $fsize \n";

$line = read_line();
while ($line!='') {
	echo $line;
	fseek($h,0,SEEK_END);
	fwrite($h,"line ".$line);
	$line = read_line();
}


fclose($h);

Sign In

Python readlines returns an empty string

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Future of PC Cooling?

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

Wait wasn't this game dead??

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI