Jump to content

Afternoon all, 

 

I've got a 1114 page PDF that is a mixture of parts diagrams and text. 

 

I'm trying to extract the information but when I try and import the data through excel it kills my ram 32GB DDR4 3200mhz. 

 

Is there a way to discard the pages with pictures on without going through them one by one? 

 

Or is there a program I can use that is better than importing from excel?

 

TIA 

Ryzen 9 7900X

Asrock X670E PG Lightning 

32GB G.Skill 6000mhz DDR5

1TB Samsung 990 Pro 

Rdna 2 iGPU 

 

 

Link to comment
https://linustechtips.com/topic/1482507-pulling-data-from-large-pdf-to-xlsx/
Share on other sites

Link to post
Share on other sites

2 minutes ago, OddOod said:

I was unaware you *could* import from PDF to XL

what data are you trying to pull? Is regex a possibility?

Yep you can. 

 

It's an automotive parts PDF. Screenshot below of what it contains. 

 

Trying to pull just the text from the part number column. 

 

image.png.5bb812d51516a873d62c6b28eaec47e2.png

Ryzen 9 7900X

Asrock X670E PG Lightning 

32GB G.Skill 6000mhz DDR5

1TB Samsung 990 Pro 

Rdna 2 iGPU 

 

 

Link to post
Share on other sites

1 minute ago, jaslion said:

Split the pdf in sections.

 

Try 100 pages at a time (course stop at logical intervals).

 

What are you using for this?

Adobe acrobat? If so 100 pages is ABSOLUTELY the max.

 

Tried 50 pages with acrobat and the program kept crashing. 

 

I was thinking of splitting the PDF into multiple PDF files and then using the Excel data import function. 

Ryzen 9 7900X

Asrock X670E PG Lightning 

32GB G.Skill 6000mhz DDR5

1TB Samsung 990 Pro 

Rdna 2 iGPU 

 

 

Link to post
Share on other sites

1 minute ago, MrBaker89 said:

Tried 50 pages with acrobat and the program kept crashing. 

 

I was thinking of splitting the PDF into multiple PDF files and then using the Excel data import function. 

Oh if you are doing excel try it with a 10page sample, set the rules up and let it rip

Link to post
Share on other sites

1 minute ago, jaslion said:

Oh if you are doing excel try it with a 10page sample, set the rules up and let it rip

Rules? Sounds like I'm missing something here lol 

Ryzen 9 7900X

Asrock X670E PG Lightning 

32GB G.Skill 6000mhz DDR5

1TB Samsung 990 Pro 

Rdna 2 iGPU 

 

 

Link to post
Share on other sites

15 minutes ago, MrBaker89 said:

Tried 50 pages with acrobat and the program kept crashing. 

 

I was thinking of splitting the PDF into multiple PDF files and then using the Excel data import function. 

Oh if you are doing excel try it with a 10page sample, set the rules up and let it rip

Link to post
Share on other sites

17 minutes ago, MrBaker89 said:

Rules? Sounds like I'm missing something here lol 

The dataload in excel is with powerquery.

 

You can define a ton of rules it needs to follow if need be. Like only fetch data if column x appears and such

Link to post
Share on other sites

3 minutes ago, jaslion said:

The dataload in excel is with powerquery.

 

You can define a ton of rules it needs to follow if need be. Like only fetch data if column x appears and such

I'll do some research on rules. Thank you. 

Ryzen 9 7900X

Asrock X670E PG Lightning 

32GB G.Skill 6000mhz DDR5

1TB Samsung 990 Pro 

Rdna 2 iGPU 

 

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×