Jump to content

How would I do this. (Working with Binary

Matom3

If someone could tell me how to solve my problem that would be great. Thanks!!!

 

I am working on a project (which I would not and cant disclose information about).

I have a PDF as well as ACII which has the raw binary expressed in bits "0 and 1". How could I translate that file back to text efficiently?

The reason why I say efficiently is because the ACII file is 22600 pages long full of binary code expressed in bits

 

I am running OSX El Capitan

 

Thanks you to all that can help

Link to comment
Share on other sites

Link to post
Share on other sites

Are you trying to extract the text from a PDF file? I think you'd probably need to know the structure of the PDF file itself to figure out how to extract text from it.

 

For example, a Word document is actually a .zip file renamed to .docx - the .xml file containing the document can be easily extracted

Speedtests

WiFi - 7ms, 22Mb down, 10Mb up

Ethernet - 6ms, 47.5Mb down, 9.7Mb up

 

Rigs

Spoiler

 Type            Desktop

 OS              Windows 10 Pro

 CPU             i5-4430S

 RAM             8GB CORSAIR XMS3 (2x4gb)

 Cooler          LC Power LC-CC-97 65W

 Motherboard     ASUS H81M-PLUS

 GPU             GeForce GTX 1060

 Storage         120GB Sandisk SSD (boot), 750GB Seagate 2.5" (storage), 500GB Seagate 2.5" SSHD (cache)

 

Spoiler

Type            Server

OS              Ubuntu 14.04 LTS

CPU             Core 2 Duo E6320

RAM             2GB Non-ECC

Motherboard     ASUS P5VD2-MX SE

Storage         RAID 1: 250GB WD Blue and Seagate Barracuda

Uses            Webserver, NAS, Mediaserver, Database Server

 

Quotes of Fame

On 8/27/2015 at 10:09 AM, Drixen said:

Linus is light years ahead a lot of other YouTubers, he isn't just an average YouTuber.. he's legitimately, legit.

On 10/11/2015 at 11:36 AM, Geralt said:

When something is worth doing, it's worth overdoing.

On 6/22/2016 at 10:05 AM, trag1c said:

It's completely blown out of proportion. Also if you're the least bit worried about data gathering then you should go live in a cave a 1000Km from the nearest establishment simply because every device and every entity gathers information these days. In the current era privacy is just fallacy and nothing more.

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, burnttoastnice said:

Are you trying to extract the text from a PDF file? I think you'd probably need to know the structure of the PDF file itself to figure out how to extract text from it.

 

For example, for a Word document is actually a .zip file renamed to .docx - the .xml file containing the document can be easily extracted

Nah. I converted it into a plain text file to make it easier to work with.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Matom3 said:

Nah. I converted it into a plain text file to make it easier to work with.

Oh. If I was converting binary back to ascii text I'd probably look up a character code table, and make a small program to convert each binary byte to an ascii character.

Speedtests

WiFi - 7ms, 22Mb down, 10Mb up

Ethernet - 6ms, 47.5Mb down, 9.7Mb up

 

Rigs

Spoiler

 Type            Desktop

 OS              Windows 10 Pro

 CPU             i5-4430S

 RAM             8GB CORSAIR XMS3 (2x4gb)

 Cooler          LC Power LC-CC-97 65W

 Motherboard     ASUS H81M-PLUS

 GPU             GeForce GTX 1060

 Storage         120GB Sandisk SSD (boot), 750GB Seagate 2.5" (storage), 500GB Seagate 2.5" SSHD (cache)

 

Spoiler

Type            Server

OS              Ubuntu 14.04 LTS

CPU             Core 2 Duo E6320

RAM             2GB Non-ECC

Motherboard     ASUS P5VD2-MX SE

Storage         RAID 1: 250GB WD Blue and Seagate Barracuda

Uses            Webserver, NAS, Mediaserver, Database Server

 

Quotes of Fame

On 8/27/2015 at 10:09 AM, Drixen said:

Linus is light years ahead a lot of other YouTubers, he isn't just an average YouTuber.. he's legitimately, legit.

On 10/11/2015 at 11:36 AM, Geralt said:

When something is worth doing, it's worth overdoing.

On 6/22/2016 at 10:05 AM, trag1c said:

It's completely blown out of proportion. Also if you're the least bit worried about data gathering then you should go live in a cave a 1000Km from the nearest establishment simply because every device and every entity gathers information these days. In the current era privacy is just fallacy and nothing more.

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, burnttoastnice said:

Oh. If I was converting binary back to ascii text I'd probably look up a character code table, and make a small program to convert each binary byte to an ascii character.

No lookup table is needed. Binary makes no difference when reading text, assuming there is no multibyte characters then simply reading a single binary byte into a char is as simple as reading it as if it were text.

 

You can try this for your self with this C++ snippet. This snippet writes a single binary byte into a text file which is binary representation of the ascii character '<'  to the file which is then read back into a char which equates to the same character.

	std::fstream f("file.txt", std::ios::binary | std::ios::out);
	char a = 0b00111100;
	f.write(&a, sizeof(1));
	f.close();

	f.open("file.txt", std::ios::binary | std::ios::in);
	char b;
	f.read(&b, sizeof(1));
	f.close();

 

CPU: Intel i7 - 5820k @ 4.5GHz, Cooler: Corsair H80i, Motherboard: MSI X99S Gaming 7, RAM: Corsair Vengeance LPX 32GB DDR4 2666MHz CL16,

GPU: ASUS GTX 980 Strix, Case: Corsair 900D, PSU: Corsair AX860i 860W, Keyboard: Logitech G19, Mouse: Corsair M95, Storage: Intel 730 Series 480GB SSD, WD 1.5TB Black

Display: BenQ XL2730Z 2560x1440 144Hz

Link to comment
Share on other sites

Link to post
Share on other sites

31 minutes ago, trag1c said:

No lookup table is needed. Binary makes no difference when reading text, assuming there is no multibyte characters then simply reading a single binary byte into a char is as simple as reading it as if it were text.

 

You can try this for your self with this C++ snippet. This snippet writes a single binary byte into a text file which is binary representation of the ascii character '<'  to the file which is then read back into a char which equates to the same character.


	std::fstream f("file.txt", std::ios::binary | std::ios::out);
	char a = 0b00111100;
	f.write(&a, sizeof(1));
	f.close();

	f.open("file.txt", std::ios::binary | std::ios::in);
	char b;
	f.read(&b, sizeof(1));
	f.close();

 

Thank you very much for supplying me with this.

However I am having trouble running your script. I must be doing something wrong. How would you run the script?

Thanks

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Matom3 said:

Thank you very much for supplying me with this.

However I am having trouble running your script. I must be doing something wrong. How would you run the script?

Thanks

Its not a script. Its executable C++ code so you would need a C++ compiler. Its also incomplete as it requires a function name, parameters, as well {} around the snippet. You would also need a main function.

 

Also in its current state its not exactly useful as it only reads 1 byte of data. It was more for demonstation purposes on how to process text stored in a binary file through programming. It would be very simple to expand to process and transcode any file from binary to text or vice versa. The idea is the same regardless of programming language.

CPU: Intel i7 - 5820k @ 4.5GHz, Cooler: Corsair H80i, Motherboard: MSI X99S Gaming 7, RAM: Corsair Vengeance LPX 32GB DDR4 2666MHz CL16,

GPU: ASUS GTX 980 Strix, Case: Corsair 900D, PSU: Corsair AX860i 860W, Keyboard: Logitech G19, Mouse: Corsair M95, Storage: Intel 730 Series 480GB SSD, WD 1.5TB Black

Display: BenQ XL2730Z 2560x1440 144Hz

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×