Jump to content

Probably because you are opening it in append mode, which starts at the end of the file.

 

https://stackoverflow.com/questions/1466000/difference-between-modes-a-a-w-w-and-r-in-built-in-open-function

Quote

 ``a+''  Open for reading and writing.  The file is created if it does not
         exist.  The stream is positioned at the end of the file.  Subse-
         quent writes to the file will always end up at the then current
         end of file, irrespective of any intervening fseek(3) or similar.

 

If you want to read it, open it in the "r" mode.

Crystal: CPU: i7 7700K | Motherboard: Asus ROG Strix Z270F | RAM: GSkill 16 GB@3200MHz | GPU: Nvidia GTX 1080 Ti FE | Case: Corsair Crystal 570X (black) | PSU: EVGA Supernova G2 1000W | Monitor: Asus VG248QE 24"

Laptop: Dell XPS 13 9370 | CPU: i5 10510U | RAM: 16 GB

Server: CPU: i5 4690k | RAM: 16 GB | Case: Corsair Graphite 760T White | Storage: 19 TB

Link to post
Share on other sites

Because with a+ the file is opened for reading and writing but the pointer is placed at the end of the file. Calling f.readlines() will not return anything, even if the file contains data.

 

With r+ mode the pointer is placed at the beginning of the file. Calling f.readlines() will return all the data in the file.

Link to post
Share on other sites

Some observations:

 

ENTER can be made up of TWO bytes / TWO characters.

 

In Windows, ENTER is  CR LF  or \r \n  -- CR is short for CARRIAGE RETURN (move cursor to beginning of line), LF is short for LINE FEED  (advance one line).

 

In Linux, ENTER is just LF  or \n  - basically Linux assumes that if the document says advance one line, you also mean to "snap" the write position to the left, start of line.

 

In very old Macs, they used to use only CR  / \r  (OS9 and before), and after that they tend to use the Linux convention ( LF  \n )

 

When you use .readline()  to read a single line,  by default Python will look for ENTER in whatever form it could be, and if there's both characters  CR and LF (  \r \n ) Python will remove the \r  and put only \n  at the end of the string.

When you use writeline() , Python will read a constant that varies depending on operating system  os.linesep  and replace that \n at the end with \r \n on \n

 

So for example, if you open a text file created in Windows and read it line by line using readline and write these lines directly to another file, you may not get an identical copy of the original file, because Python has stripped those  CR  (\r) characters from the end of that line you read.

If you open a text file made in Windows which uses \r\n but you run Python under Linux, then writeline will write \n.  If you run Python under Windows, in Windows the os.linesep will be \r\n so Python writeline will replace \n at the end with \r\n

 

You can tell Python to only consider some combinations as new line when you use the open function, see  https://docs.python.org/3/library/functions.html#open

 

open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

 

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

  • When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

 

PS... all this assumes the text file is written in a regular codepage or UTF8 - if the file is UTF-16 or UTF-32, you're screwed, as each character uses multiple bytes in those encodings .... but now we're getting into messy stuff.

Link to post
Share on other sites

15 hours ago, mariushm said:

Some observations:

 

ENTER can be made up of TWO bytes / TWO characters.

 

In Windows, ENTER is  CR LF  or \r \n  -- CR is short for CARRIAGE RETURN (move cursor to beginning of line), LF is short for LINE FEED  (advance one line).

 

In Linux, ENTER is just LF  or \n  - basically Linux assumes that if the document says advance one line, you also mean to "snap" the write position to the left, start of line.

 

In very old Macs, they used to use only CR  / \r  (OS9 and before), and after that they tend to use the Linux convention ( LF  \n )

 

When you use .readline()  to read a single line,  by default Python will look for ENTER in whatever form it could be, and if there's both characters  CR and LF (  \r \n ) Python will remove the \r  and put only \n  at the end of the string.

When you use writeline() , Python will read a constant that varies depending on operating system  os.linesep  and replace that \n at the end with \r \n on \n

 

So for example, if you open a text file created in Windows and read it line by line using readline and write these lines directly to another file, you may not get an identical copy of the original file, because Python has stripped those  CR  (\r) characters from the end of that line you read.

If you open a text file made in Windows which uses \r\n but you run Python under Linux, then writeline will write \n.  If you run Python under Windows, in Windows the os.linesep will be \r\n so Python writeline will replace \n at the end with \r\n

 

You can tell Python to only consider some combinations as new line when you use the open function, see  https://docs.python.org/3/library/functions.html#open

 

open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

 

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

  • When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

 

PS... all this assumes the text file is written in a regular codepage or UTF8 - if the file is UTF-16 or UTF-32, you're screwed, as each character uses multiple bytes in those encodings .... but now we're getting into messy stuff.

the files are written by python

Link to post
Share on other sites

19 hours ago, tikker said:

Probably because you are opening it in append mode, which starts at the end of the file.

 

https://stackoverflow.com/questions/1466000/difference-between-modes-a-a-w-w-and-r-in-built-in-open-function

 

If you want to read it, open it in the "r" mode.

I want to both read it and append to it.

Link to post
Share on other sites

18 hours ago, C2dan88 said:

Because with a+ the file is opened for reading and writing but the pointer is placed at the end of the file. Calling f.readlines() will not return anything, even if the file contains data.

 

With r+ mode the pointer is placed at the beginning of the file. Calling f.readlines() will return all the data in the file.

Ok but since I want to both read it and append to it I felt like this way grants better optimization.

Link to post
Share on other sites

1 hour ago, Wictorian said:

I want to both read it and append to it.

Then you can use r+ as mentioned above and first read everything, followed by appending what you want to append.

Crystal: CPU: i7 7700K | Motherboard: Asus ROG Strix Z270F | RAM: GSkill 16 GB@3200MHz | GPU: Nvidia GTX 1080 Ti FE | Case: Corsair Crystal 570X (black) | PSU: EVGA Supernova G2 1000W | Monitor: Asus VG248QE 24"

Laptop: Dell XPS 13 9370 | CPU: i5 10510U | RAM: 16 GB

Server: CPU: i5 4690k | RAM: 16 GB | Case: Corsair Graphite 760T White | Storage: 19 TB

Link to post
Share on other sites

You have a couple options

open the file for reading

read all the lines in memory

close the file

do something with the lines

open the file for appending (which sets the file pointer to the end of the file)

write your new lines

close file

 

The downside of this method is that you have to store ALL lines in memory. If your text file is 100 MB, your Python program will use 100 MB+ of memory storing millions of lines of text in memory.

 

The other option is to not use readline or readlines, and determine where each line ends by yourself.

 

Open the file for read / write. Determine the file size and store that somewhere.

Your function will read data from the file until it reaches the file size value stored when you opened the file. You don't want to read until the end of file, because you may have appended lines to the end of the file.

You will have two offsets  offset_read which will be initially 0, because you start reading from the beginning of file,  and an offset_write which will be equal to file size, because when you write, you want to seek to the end of the file . Or you can simply use seek with the parameter whence set to end of file  ex seek (0,2) means seek to 0 bytes before the end of file .. see https://www.tutorialspoint.com/python/file_seek.htm

 

Each time you try to read a line, you go byte by byte until you find the CR LF or LF sequence, and that's your line of text

 

Here's a crude example in PHP ... it reads a text file line by line, but tries to hold less than 512 characters in memory at any time and after every line read, it goes to the end of the file and appends the read line to the text file , basically doubling it.

 

$buffer = ''; // buffer to hold a small amount of data from the file at a time
$offset = 0;  // offset in the file
$fsize  = 0;  // file size 

function fill_buffer() {
	global $h,$offset,$buffer,$fsize;
	if ($offset >= $fsize) return 0; // if we reached the original end of file, we can't refill the buffer, nothing to fill with
	$max_size = $fsize-$offset;
	if ($max_size>512) $max_size = 512;
	echo "reading $max_size bytes.\n";
	// each time we read more data from file, we have to set our read pointer there, because we don't know 
	// if an append operation changed it or not
	fseek($h,$offset,SEEK_SET); 
	$read = fread($h,$max_size);
	$offset += strlen($read);
	$buffer .= $read;
	return strlen($read);
}

function read_line() {
	global $buffer;
	// is there a line terminator in the existing buffer? If not, it may be a longer line or the file may not end with a line ending
	$position = strpos($buffer,"\n");
	if ($position===FALSE) {
		$continue = TRUE;
		while ($continue==true) {
			$bytes_read = fill_buffer();
			if ($bytes_read==0) {
				$continue=FALSE; // we reached the end of file, so stop trying to read more data
			} else {
				// is there a new line terminator in the new buffer? If so, stop reading more than needed.
				$position = strpos($buffer,"\n");
				if ($position!==FALSE) $continue=FALSE;
			}
		}
	}
	if ($position!==FALSE) {
		// return only characters up to \n
		$line = substr($buffer,0,$position+1);
		$buffer = substr($buffer,$position+1);
	} else {
		// return the whole buffer, because we tried to fill buffer until a \n was found but had no luck and we reached end of file
		$line = $buffer;
		$buffer = '';
	}
	return $line;
}


$h = fopen('c:/temp/test.txt','r+');	// open file in read/write mode
$fileinfo = fstat($h);
$fsize = $fileinfo['size'];
echo "File size: $fsize \n";

$line = read_line();
while ($line!='') {
	echo $line;
	fseek($h,0,SEEK_END);
	fwrite($h,"line ".$line);
	$line = read_line();
}


fclose($h);

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×