Jump to content

Why PDFs exist

Watching the WAN show I was shocked to hear @LinusTech and Luke hating on PDF files and wondering why they exist, with Luke offering the bizzarro-world comment that you can't edit PDF files.  (To be fair, Linus gave a rough answer that was vaguely in the right direction, but missed a critical element.)

 

In simplest terms PDFs exist because if one company had proprietary control over all document interchange, that would be evil.  Microsoft was working hard to be that evil company, and most Windows losers [oops] were happy to send each other .doc files when they created a document and wanted to send it electronically, rather than by snail mail  Sending .doc files worked fine, until you sent one to a Mac user or a Unix user, or an Amiga user etc.  So (in the short version) Adobe created an open standard for document interchange that anyone could read anywhere with a free browser provided by Adobe, or developed by anyone who could read the open standard.

 

Sure, there were early efforts at developing software on Mac and Unix and other OSes that were in play back then to read and write .doc files, but they were clunky and worked poorly, and Microsoft actively "upgraded" their proprietary format to frustrate this exact sort of development.

 

So the reason I say I was shocked is because Linus is usually more savvy about giant corporations trying to control information through proprietary standards, and the need for open standards to avoid that nonsense.

 

To be fair, no PDF was never intended to be an editable format  Luke was on the right track here when he said it's like a digital printer.  When Word was young, the endpoint of documents was priting them out.  But there was a need to sometimes have the endpoint be sending them electronically.  And, like a printout, there was no goal for the document to be editable.  The goal was to share a "printout" electronically, with anyone, whether they had purchased Word from giant evil Microsoft or not.

 

A lot of younger people today don't appreciate how much more evil Microsoft was in the early years.  The browser wars is a well-known example, but that wasn't an isolated event.  Microsoft tried to do this whenever they could, and document exchange was absolutely not an exception to their desire to be evil and monopolistically control everything that happened on computers.

 

The longer story of why PDFs exist (in the form that they exist) is perhaps more technically interesting.  Before PDFs, the preferred non-proprietary format was another open Adobe document format, PostScript.  Luke would really appreciate this as would any programmer, because PostScript was (and still is) FREAKING AWESOME.  PostScript, for those who don't know, was a standard on how to get graphical information from document software to a printer, but the really cool part is that it's a complete programming language, with powerful graphics and text capabilities built in.  It has loops and functions and complex data types, and works as an interpreted language.  All in reverse-polish notation, which is trippy, but really fun to program in.  Reverse polish notation means that it was a stack based language, where objects were pushed onto a stack until a command was pushed, and then the command would be executed, using up data from the stack as needed.  As a trivial example, consider this code fragment that draws a grey box with a black outline:

0 setgray
newpath         %start a new line
10 10 moveto    %move the current point to 10,10
100 10 lineto   %add a line segment from the current point to 100,10, and then make 100,10 the current point
100 100 lineto
10 100 lineto
closepath       %add a line segment (if needed) to make this path a closed shape
gsave           %save the current graphics context (i.e. the path we just drew, plus other things like current color)
0.5 setgray     %set color to halftone gray
fill            %fill the current path, and then dispose of that path
grestore        %put the last saved graphics context back (restore the path we just used up)
4 setlinewidth
stroke          %draw the current path in the current color and dispose of the current path

 

PostScript was great if you were a geek and wanted to write a program in 35 lines that would draw a seal, and then send that to the printer.  But most PostScript was produced by printer drivers and they were terrible, and would include in any PostScript output the entirety of the printing functions that the writers of that driver had created to "simplify" turning documents into printed output.  So "Hello World" written by me, by hand, would be four lines, but written by a printer driver could  be like a megabyte, which in those days was insanely large.

 

As cool as it is that PostScript is a complete printer language, in practice it produced overly large documents and they were slow to print (because the program had to be run, and computers (especially the ones they stuck inside printers) were not that fast back then.  And PostScript was also being used beyond printers as a generic document exhcnage format (again, because sending .doc files is evil, or at least it was then).  So PDF was created as a new approach to digital output from word processors that could be sent to printers, but also be exchanged among people without having to pay the Micro$oft tax, and be used more efficiently to display documents without printing them.

 

PDF does still include a postscript-like graphical language inside that's used for all scalable graphics, and as the basis for the text capabilities for PDF.  But PDF is also a container format, that lets you include image files in a variety of formats.  It specifies all things as objects, describes what objects are on what pages, and where they are on the page, and how they're arranged.  It's also indexed, so that printers and readers can process it much more efficiently and (for readers in particular) jump around to different pages without having to reprocess the entire document again, as one would need to do for PostScript files.  (There were actually some standardized PostScript formatting conventions that also allowed for this by adding structure to the PostScript documents in comments, and it was widely used, but kind of a hack).

 

PDF added the (optional) ability to be searchable by specifying the actual textflow and rough location of all the text in a document — because without this, text was only described graphically, not necessarily in order, not necessarily even as whole words.  Later versions of PDF made it possible to begin displaying the PDF while it was still loading (because in early versions the object index came last, so you couldn't display anything until you had a complete file.

 

So yeah.  it isn't editable**.  It was never supposed to be editable.  It was supposed to be a way for people to look at "printouts" without printing them, to send them back and forth intact, and to not have to pay Micro$oft for the privilege of doing any of this.

 

The idea that the recipient of a file wouldn't need to the same program that the sender used to create it is critically important in the history of software.  Think of it as "right to repair", but in this case it was "right to read documents without paying a vendor tax".

 

Hey you know what, this might make for an interesting TechQuickie.  Also, I encourage Luke to spend an afternoon learning some PostScript basics.  it's a very satisfying programming experience.  You'll need a PostScript viewer, and these days I'm not sure what supports that.

 

**Fun fact: you can load a PDF file into a text editor, search for JPEG objects within the file, trim everything above and below any set of JPEG data, save the result to a new file, and just view it as a JPG.  Because it is a JPG.  Because PDF is a container format, not unlike a zip file (credit to Dan for making that comparison).  But a container format crammed with document layout and presentation information.  I actually do this to get images out of PDFs sometimes (modern viewers sometimes allow you to just do this through the interface, and there's also command line programs to extract images, but hey I'm old school).  Anyway, the point here is that, for me, PDF files are "editable" because I can do this.

 

I'm editing this to add that an absolutely critical modern use of PDFs is in printing. They are now the de facto printer language, having almost entirely replaced PostScript (it's possible that some modern printers may not even be able to print PostScript files.)  Sure you can often directly send an image format like jpg or png and most printers can handle it.  But if you want to print a file that is going to be scaled perfectly to the resolution of the printer, it's basically guaranteed that it will be PDF.  If you want to be able to scale a page up to poster size and have it not look like ass, it has to be PDF.

Link to comment
Share on other sites

Link to post
Share on other sites

I do alot of PDF creation in my line of work unfortunately, and fillable forms are a godsend.  I cant imagine having everything in doc format.  So even if DLL and crew dont see the advantages of PDF, I do.  Digital signatures in word docs are kind of a joke, and controlling your documents integrity is alot simpler due to enhancements available in the PDF format.  I'm with team PDF.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Taerin said:

I do alot of PDF creation in my line of work unfortunately, and fillable forms are a godsend.  I cant imagine having everything in doc format.  So even if DLL and crew dont see the advantages of PDF, I do.  Digital signatures in word docs are kind of a joke, and controlling your documents integrity is alot simpler due to enhancements available in the PDF format.  I'm with team PDF.

Oh yeah.  I was focusing on the history lesson, not the modern use cases.

 

There's lots of reasons today why PDFs remain viable, even though Microsoft kind of won the "Word is everything" battle (and backed off of trying to block other software from writing Word files).  Their inability to be edited is in fact a feature.  They're hard to tamper with, and there are additions to the PDF standard that make them even more tamperproof, and to add data internally to make them suitable for archiving where proof of lack of manipulation is important.

 

However, it's fair to say that if the modern use cases were the only goals, PDF might look different than it does.  Again, that's why I focused on the history.

Link to comment
Share on other sites

Link to post
Share on other sites

The same reason Fax exists, to annoy the f... out of everyone that has to deal with them, seriously, you know in corporate enviroments EVERYTHING is made to be a burden to the employees, why use something fast and efficient like an email when you can have someone handwrite or type (in a typewriter) a file, then someone else digitise that file only to print it and send it via fax to someone on the other side who will repeat the same steps backwards.

 

The same happens with PDF, minus the fax part. Someone creates the file and sends it to someone else, then if changes have to be made the other party has to type the entire file again plus the changes and resend it. That's it.

Caroline doesn't need to hear all this, she's a highly trained professional.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Caroline said:

The same reason Fax exists, to annoy the f... out of everyone that has to deal with them,

Maybe you're just trolling but for the record modern fax machines were introduced in 1964 and the Internet didn't see widespread business use until lets say 1998 or so.  Faxes were created to exchange documents over distance faster than postal mail or professional courier service.

Link to comment
Share on other sites

Link to post
Share on other sites

As a guy who had had a lot to do with archiving, I'd also like to point out there's not one type of PDF. PDFs have a range of different formats for different types of jobs. PDF/A (and PDF/A variants) are used for long term storage. All the information, including things like fonts and colour profiles are stored within the file. Compliant PDF/A viewers are required to ignore extra information that doesn't conform to the PDF/A standard as well as not substitute system defaults for the embedded ones. 

 

PDFs can be trivial to edit and I imagine it's not impossible to bypass protected PDFs to view their contents but the real security is being able to digitally sign the PDF so copies of the PDF can be compared to the original to ensure no changes have taken place.

 

Also PDFs are just a super useful format for people who have to scan paper documents.

Link to comment
Share on other sites

Link to post
Share on other sites

.PDF's produced from LaTeX are used in scientific and technical publishing.  For many of the reasons you outlined.  LaTeX because it is a language that can be translated into anything from a PDF to a web page and is an open standard.  .PDF because of that "digital printer" aspect.  One can get a "printout" that looks like what will come out of the printer in a WYSIWYG style.  

Link to comment
Share on other sites

Link to post
Share on other sites

I should also add that PDFs are now the de facto printer language, having almost entirely replaced PostScript (it's possible that some modern printers may not even be able to print PostScript files.)

 

Sure you can often directly send an image format like jpg or png and most printers can handle it.  But if you want to print a file that is going to be scaled perfectly to the resolution of the printer, it's basically guaranteed that it will be PDF.  If you want to be able to scale a page up to poster size and have it not look like ass, it has to be PDF.

 

It's use in the modern printing pipeline is critical, and all by itself justifies PDF's continued modern existence.  (Though reasons others are giving are also good.)

 

Hmm I should probably just add this to the original posting.

Link to comment
Share on other sites

Link to post
Share on other sites

there probably more to the history of fax thats missing... but ya fax today makes no scenes. at work i print out a paper deforesting the planet to do an order then i fax it off and then put it in the recycling bin doing my part at saving the planet...🤔

 

anyway one skill i have not learned is the pdf. oh well i just have like 100 notepad docs all over...🤷‍♂️

I have dyslexia plz be kind to me. dont like my post dont read it or respond thx

also i edit post alot because you no why...

Thrasher_565 hub links build logs

Corsair Lian Li Bykski Barrow thermaltake nzxt aquacomputer 5v argb pin out guide + argb info

5v device to 12v mb header

Odds and Sods Argb Rgb Links

 

Link to comment
Share on other sites

Link to post
Share on other sites

PDF is the only way for documents to be what you see is what you get when you print. 
Type setting before PDF was one of the largest Pain in the asses in the computer world
https://www.youtube.com/results?search_query=computephile+pdf



Even RTF will look different in every software and with how you print it. 

Link to comment
Share on other sites

Link to post
Share on other sites

My main use of PDF is that plenty of tasks on uni we get asked to deliver our answers as PDF 🙂

 

One thing I noticed last year is of you format a text with images and stuff with Word, plenty of times it's formatting does not look the same in Word Online. I find that kind of stupid because both are Microsoft products, would be more understandable if they weren't.

“Remember to look up at the stars and not down at your feet. Try to make sense of what you see and wonder about what makes the universe exist. Be curious. And however difficult life may seem, there is always something you can do and succeed at. 
It matters that you don't just give up.”

-Stephen Hawking

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Taerin said:

I do alot of PDF creation in my line of work unfortunately, and fillable forms are a godsend.  I cant imagine having everything in doc format.  So even if DLL and crew dont see the advantages of PDF, I do.  Digital signatures in word docs are kind of a joke, and controlling your documents integrity is alot simpler due to enhancements available in the PDF format.  I'm with team PDF.

Integrity is a great way of summing it up.

 

The last thing you want is for users to be able to easily screw up an entire document when you just want them to fill in a field. And oh boy, can users SCREW IT UP given the chance.

 

PDF gives a great chance at being able to slap the clumsy hand away from causing utter chaos.

Link to comment
Share on other sites

Link to post
Share on other sites

Do you have a forum account just to complain about things you heard on the WAN show?

 

I find it strange to be so invested in a document format, that you would type that much in response to 2 guys just talking shit to each other.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Lunar River said:

Do you have a forum account just to complain about things you heard on the WAN show?

.... umm.... yes?

 

1 hour ago, Lunar River said:

I find it strange to be so invested in a document format, that you would type that much in response to 2 guys just talking shit to each other.

I can't stand seeing smart people being clueless.

And "2 guys just talking shit to each other"... the WAN show is a lot better than that (mostly).

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, Thomas A. Fine said:

And "2 guys just talking shit to each other"... the WAN show is a lot better than that (mostly).

except it was like a 4 minute conversation in response to a sponsor they had an ad read for. The proportional response you gave is insanely over the top for something that was essentially an offhand remark.

Link to comment
Share on other sites

Link to post
Share on other sites

PDF files are editable. When you create a PDF from scratch, you can add text, elements, move the elements around, etc. While I'm not getting into the technical parts of it...to say they're not "editable" is simply not true. My job requires me to manipulate them at times, like I described at the beginning of this post.

"It pays to keep an open mind, but not so open your brain falls out." - Carl Sagan.

"I can explain it to you, but I can't understand it for you" - Edward I. Koch

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, Thomas A. Fine said:

I should also add that PDFs are now the de facto printer language, having almost entirely replaced PostScript (it's possible that some modern printers may not even be able to print PostScript files.)

I don't know about this.  On Linux the only way to be 100% certain your printer will work is if it is a Post Script printer.  That said, I really don't see why in this day and age a printer has a driver in the traditional sense.  It could just be a PDF that the computer sends along with commands to tell the printer the type of paper, source of paper, draft mode resolution etc. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

17 hours ago, starsmine said:

PDF is the only way for documents to be what you see is what you get when you print. 
Type setting before PDF was one of the largest Pain in the asses in the computer world
Even RTF will look different in every software and with how you print it. 

God I cannot stress enough how much I hate people that bring .doc/.docx files for me to print. The formatting is almost always COMICALLY incorrect, and if I send them back home to convert it to a PDF lo and behold the formatting is absolutely perfect every single time.

 

It's also just nice being able to secure PDFs reasonably well. I just filled out a job application (entirely within Acrobat!) and exported my resume from Word into a PDF, and in both cases I made them conform to a standard which almost guarantees that they are not modifiable by anyone. Doesn't super matter for I'm doing sure, but it is still very nice to have that option.

Main rig on profile

VAULT - File Server

Spoiler

Intel Core i5 11400 w/ Shadow Rock LP, 2x16GB SP GAMING 3200MHz CL16, ASUS PRIME Z590-A, 2x LSI 9211-8i, Fractal Define 7, 256GB Team MP33, 3x 6TB WD Red Pro (general storage), 3x 1TB Seagate Barracuda (dumping ground), 3x 8TB WD White-Label (Plex) (all 3 arrays in their respective Windows Parity storage spaces), Corsair RM750x, Windows 11 Education

Sleeper HP Pavilion A6137C

Spoiler

Intel Core i7 6700K @ 4.4GHz, 4x8GB G.SKILL Ares 1800MHz CL10, ASUS Z170M-E D3, 128GB Team MP33, 1TB Seagate Barracuda, 320GB Samsung Spinpoint (for video capture), MSI GTX 970 100ME, EVGA 650G1, Windows 10 Pro

Mac Mini (Late 2020)

Spoiler

Apple M1, 8GB RAM, 256GB, macOS Sonoma

Consoles: Softmodded 1.4 Xbox w/ 500GB HDD, Xbox 360 Elite 120GB Falcon, XB1X w/2TB MX500, Xbox Series X, PS1 1001, PS2 Slim 70000 w/ FreeMcBoot, PS4 Pro 7015B 1TB (retired), PS5 Digital, Nintendo Switch OLED, Nintendo Wii RVL-001 (black)

Link to comment
Share on other sites

Link to post
Share on other sites

54 minutes ago, Lunar River said:

except it was like a 4 minute conversation in response to a sponsor they had an ad read for. The proportional response you gave is insanely over the top for something that was essentially an offhand remark.

It takes as long as it takes to explain a thing.

 

Not interested?  Don't read it.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Thomas A. Fine said:

.... umm.... yes?

 

I can't stand seeing smart people being clueless.

And "2 guys just talking shit to each other"... the WAN show is a lot better than that (mostly).

the podcast is NOT preresearched on the topics being talked about. It does not matter how smart the people in any podcast are, if you don't preresearch then mistakes are expected and you should not use it as a source. 

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, starsmine said:

the podcast is NOT preresearched on the topics being talked about. It does not matter how smart the people in any podcast are, if you don't preresearch then mistakes are expected and you should not use it as a source. 

Ok the next time I find myself writing a peer-reviewed paper, I won't cite the WAN show.

Sound advice.  10/10 Would read again.

Link to comment
Share on other sites

Link to post
Share on other sites

lol I dont think Fine was taking a dig at anyone on the wan show, just stating his counter opinion to what they stated as an opinion.   Also don't deify your heroes, you only give them farther to fall.  Its a techy pod cast, and were just some people chatting on the Internet. *raises glass*

Link to comment
Share on other sites

Link to post
Share on other sites

I just now came to watching last weeks WAN show and I also stumbled across the PDF debate. Working for the publishing industry, I have to deal with PDFs on a daily basis.

To the OP:

Quote

So (in the short version) Adobe created an open standard

No, Adobe developed PDF in 1992 as a proprietary format. It was only after it was standardized by the ISO in 2008 that it became the open standard we know, use, and love today.

Quote

To be fair, no PDF was never intended to be an editable format

On that we agree. PDF is intended to reproduce a document faithfully on all devices, independent on what hardware, software, or operating system the file is viewed or processed on.

 

In industry practice, PDF files are hardly ever edited by hand, but documents and graphics are (re-)converted into PDF from other source formats, like docx, InDesign, LaTeX, or various image formats. With the expressed purpose that those pdf files are no further manipulated to ensure that when shipped to a printing company or archived on some library servers, the document or image looks the same at all times and in all places. That's why there are (basicly) two PDF standards: PDF/X for print, and PDF/A for longterm archiving. If a PDF document wants to conform to either of those standards, it needs to be protected against editing and explicitly forbids dynamic elements like forms. And yes, you can still edit a protected PDF, but since PDFs always store creation and modification time stamps, manipulations to a PDF are traceable when compared to the original source.

Link to comment
Share on other sites

Link to post
Share on other sites

I find it a little staggering that someone in their position wouldn't see any use for PDF. PDF has problems and maybe with the benefits of 30 years of hindsight we could come up with a better format that carries the same benefits, however there's no disputing the need for something to fill that function.

 

To date, PDF is the only widespread format that allows you to view a document in the same exact way across all devices and using different viewing software. MS Office documents sometimes don't even render properly across different versions of ms office, let alone on different software suites or across platforms.

On 2/12/2024 at 1:21 AM, Uttamattamakin said:

I don't know about this.  On Linux the only way to be 100% certain your printer will work is if it is a Post Script printer.  That said, I really don't see why in this day and age a printer has a driver in the traditional sense.  It could just be a PDF that the computer sends along with commands to tell the printer the type of paper, source of paper, draft mode resolution etc. 

I suspect that's mainly because printers do not support all possible formats and having to guess among the hundreds of settings while rendering the PDF would be hell.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×