Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
whiteGloveReview

Handeling ' & " in a text file in c++

Recommended Posts

Posted · Original PosterOP

HI all i'm writing a little c++ program to take a look at a text file and deliver some statistics on it. It's parsing through it character by character and printing them out as it runs them through, but I've noticed that it doesn't like apostrophes and quotation marks (most likely also semicolons as well but i haven't noticed anything like this occurring. I decided to feed it the first harry potter book just because it was a large file that i thought it could handle pretty well. I fed it before a list of different characters (every character on the keyboard and it seemed to handle them fine. Any ideas as to why this bug is occurring?

 

Here's an exerpt from the book
 

“Oh, I will,” said Harry, and they were surprised at 
the grin that was spreading over his face. “ They don’t 
know we’re not allowed to use magic at home. I’m 
going to have a lot of fun with Dudley this summer...” 

and here's the same excerpt after the text file was ran through the program

ôOh, I will,ö said Harry, and they were surprised at
the grin that was spreading over his face. ô They donÆt
know weÆre not allowed to use magic at home. IÆm
going to have a lot of fun with Dudley this summer...ö

 

Link to post
Share on other sites
Posted · Original PosterOP
Just now, fizzlesticks said:

Need to see the code to help.

	ifstream doc0("harrypotter.txt");
	while (!doc0.eof()) {
		doc0.get(thisis);
		charcount++;
		if (thisis == '\n') linecount++;
		cout << thisis;

	}
	doc0.close();

	ifstream doc1("harrypotter.txt");
	while (!doc1.eof()) {
		while (doc1 >> apen) {
			wordcount++;
		}
	}
	doc1.close();

 

Link to post
Share on other sites
4 minutes ago, whiteGloveReview said:

so i can just run them through a find and replace to change them to normal characters and it'll be fine?

That would work but if there are unicode quotes there are probably unicode other things and you'll have to go through the entire thing to find them all.


1474412270.2748842

Link to post
Share on other sites
Posted · Original PosterOP
Just now, fizzlesticks said:

That would work but if there are unicode quotes there are probably unicode other things and you'll have to go through the entire thing to find them all.

could i just save the text as say ANSI or something?

Link to post
Share on other sites
5 minutes ago, whiteGloveReview said:

could i just save the text as say ANSI or something?

That would probably just delete all the unicode characters not convert them to ascii. But depending on what you're doing that may work fine.


1474412270.2748842

Link to post
Share on other sites

your problem is with the charset, try making it open with a different charset until you find the one that matches.


The best way to measure the quality of a piece of code is "Oh F*** "s per line

Link to post
Share on other sites
7 hours ago, fizzlesticks said:

That would probably just delete all the unicode characters not convert them to ascii. But depending on what you're doing that may work fine.

I imagine most modern text editors are smart enough to deal with converting applicable characters to their ASCII counterparts.

 

Here is the Encoding menu in Notepad++, for example:

Untitled.png.49300b0025475150a2f562b4ad888435.png

 

You'd select "Convert to ANSI", save it, and *Boom* - done.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×