Jump to content

Why PDFs exist

13 minutes ago, Sauron said:

I suspect that's mainly because printers do not support all possible formats and having to guess among the hundreds of settings while rendering the PDF would be hell.

Basically what I imagined is that these days every printer could run a small web-print server.  Basically the driver just presents a front for this server.  You know like how at many big enterprises to print one just uploads a PDF then can print it out at any old printer on site. 

 

At the same time I see your point because then the point and click printing is gone. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Uttamattamakin said:

Basically what I imagined is that these days every printer could run a small web-print server.  Basically the driver just presents a front for this server.  You know like how at many big enterprises to print one just uploads a PDF then can print it out at any old printer on site.

Sure, you're offloading the work on the printer itself which would include essentially a computer. Typical on large office printers, less so on inexpensive home ones.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

I work on an enterprise SaaS product where we offer the ability to store business files of all sorts, and then allow you to preview them and convert them into all sorts of various formats. And the file conversion stuff has been my focus for years, and PDFs specifically have given me all sorts of trouble...

 

I like the idea of PDFs and are glad they exist to a degree, but unlike OP I do not see a beauty under the hood, only a partially-standardized, Eldritch abomination of a format that has evolved from its pure PostScript roots into something that's really hard to work with. It's, like, really hard to write your own PDF parser and renderer (let alone editor) because the format is so obtuse, and every PDF-creating program produces different output that cause all sorts of edge cases. It's like web browsers -- we like the idea that anyone can write their own and formats are open, but in practice everyone just consolidates to a few select implementations because its simply too difficult. PDF seems to be that way, and heading that way.

 

I know the history of PDF and have a certain respect for it in principle, but that doesn't mean I have to like the format or that we're still using it in a digital-first era. Personally I would prefer some form of multi-page SVG for document/printable interchange, or even Microsoft's XPS they tried to push. Describing documents more as structured markup and data within a format makes it much easier to parse and manipulate.

 

And as far as avoiding corporate lock-in and control, PDF isn't some kind of savior. Adobe loves to add their own "extensions" to the PDF format that are very clearly only intended to ever work properly in Adobe Reader or Adobe Acrobat, often going out of their way to make the additions difficult for other programs to reverse-engineer (though they manage to figure it out anyway like Foxit). This is the "extend" part of EEE that is a page out of Microsoft's old playbook, if you remember. Adobe never had pure, benevolent intentions with sharing PDF with the world, they were just more subtle about things than Microsoft.

 

The fact that you can add custom behaviors to a PDF with embedded JavaScript makes my stomach turn...

Link to comment
Share on other sites

Link to post
Share on other sites

Didn't catch this on the WAN show so I can't speak to any of what they said and won't put words in their mouths but as someone who designs labs, PDFs what we export out CAD/BIM plans to in order to send to clients, markup, etc.  If a better standard became dominant and served our purposes, I'm sure we'd use that instead but as of now it hasn't.

 

One note- we don't use adobe acrobat but a program called Bluebeam which is far better for our purposes and incorporates some nice vector features. We can add things like a "scale" to the drawings to do things like take measurements directly off the PDF when needed. That might not sound useful to most people but it's invaluable for us.

Link to comment
Share on other sites

Link to post
Share on other sites

There is nothing wrong with PDF. It serves the purpose it was designed for, to save the output in a way that everyone see's the same result.

 

What's the alternative? sending proprietary MS word docs? Sending LibreOffice junk? Stuff that doesn't open in each other's software?

 

Imagine if the web was PDF. It would be large, but it would be the same on every device. Currently not the case because Google's claws are in the web standards and in the major browser engine used by Chromium-based browsers.

 

What is wrong is how many software venders want PDF to be something it's not. Take all the "free" PDF viewers and editors. They let you do very little, basically just "here's a typewriter" to type over the document. There's no way to actually make edits to the PDF and retain the integrity of it. Enter digital signatures. 

 

image.thumb.png.e0e827f0b0b00eaf4541b361aa1c0426.png

This is a proprietary thing. If you send a "signed" document to someone else who doesn't have a reader that understands it, welp.

 

This happened when I signed something, where I just got fed up with how pdf xchange wanted to do it, so I just "printed" it the PDF printer driver. Done. It's fine. But if this was ever a legal situation, all this signing does nothing unless everyone used the same software.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 2/17/2024 at 5:12 PM, smcoakley said:

but unlike OP I do not see a beauty under the hood

I don't think I ever said it's a beautiful format.  It's definitely kind of a mess.  The things it accomplishes are actually pretty impressive though once you get familiar with PDF internals.  But they probably could have been accomplished more elegantly.  (On the other hand, PostScript is elegant and beautiful.)  My point was more that the platform independence of PDF helped block Microsoft from monopolizing yet another aspect of how we use computers.  That is a thing of beauty.

 

On 2/17/2024 at 5:12 PM, smcoakley said:

It's, like, really hard to write your own PDF parser and renderer

I can attest to this.  Writing a "simple" program that just strips images out of a PDF file is challenging.  There's several versions of PDF to support, and many options within each version (like the bizarre tricks they use to create PDFs that can be rendered before they finish loading, when normally the index is the last thing in a PDF).  Even simple forensic tools are tough, because there's more than one standardized way to store metadata, and they might have used more than one method.

 

On 2/17/2024 at 5:12 PM, smcoakley said:

Personally I would prefer some form of multi-page SVG for document/printable interchange

Related to this, I'm in favor of web development being SVG first, with bits of HTML inserted as needed.  HTML is the worst page layout language never** invented (**it just sort of happened over time).  And back in 1992 I was telling the web community that HTML was going to grow into a mess if people kept cramming features in, and that we should instead develop web browsers that would use display PostScript for any more advanced/fancy layouts, and leave HTML for only the most basic text documents.  Sadly they didn't take my advice on that issue, and my boss wouldn't let me compete with Andreesen and develop that idea into a web browser on my own.  He looked at me and said "the web is never going to amount to anything".

 

On 2/17/2024 at 5:12 PM, smcoakley said:

And as far as avoiding corporate lock-in and control, PDF isn't some kind of savior. Adobe loves to add their own "extensions" to the PDF format

Sure.  Adobe was hoping for the same control as Microsoft.  BUT Microsoft was trying to use their OS monopoly to throw their weight around, while Adobe was trying to get there by backing a quasi-open standard.  Did they see that as an advantage to them to do that?  Absolutely.  But between purely proprietary monopolistic evil corporate plots, and semi-open standard evil corporate plots, I'll take the semi-open standard route every time.

Link to comment
Share on other sites

Link to post
Share on other sites

PDF seems fine to me, it's a format that most people can open, works if the user doesn't have the application that was used to create the document, and most importantly when generating things like quotes, invoices, legal correspondence etc, it's read only to the recipient

Link to comment
Share on other sites

Link to post
Share on other sites

On 2/10/2024 at 11:25 PM, Thomas A. Fine said:

Watching the WAN show I was shocked to hear @LinusTech and Luke hating on PDF files and wondering why they exist, with Luke offering the bizzarro-world comment that you can't edit PDF files.  (To be fair, Linus gave a rough answer that was vaguely in the right direction, but missed a critical element.)

 

In simplest terms PDFs exist because if one company had proprietary control over all document interchange, that would be evil.  Microsoft was working hard to be that evil company, and most Windows losers [oops] were happy to send each other .doc files when they created a document and wanted to send it electronically, rather than by snail mail  Sending .doc files worked fine, until you sent one to a Mac user or a Unix user, or an Amiga user etc.  So (in the short version) Adobe created an open standard for document interchange that anyone could read anywhere with a free browser provided by Adobe, or developed by anyone who could read the open standard.

 

Sure, there were early efforts at developing software on Mac and Unix and other OSes that were in play back then to read and write .doc files, but they were clunky and worked poorly, and Microsoft actively "upgraded" their proprietary format to frustrate this exact sort of development.

 

So the reason I say I was shocked is because Linus is usually more savvy about giant corporations trying to control information through proprietary standards, and the need for open standards to avoid that nonsense.

 

To be fair, no PDF was never intended to be an editable format  Luke was on the right track here when he said it's like a digital printer.  When Word was young, the endpoint of documents was priting them out.  But there was a need to sometimes have the endpoint be sending them electronically.  And, like a printout, there was no goal for the document to be editable.  The goal was to share a "printout" electronically, with anyone, whether they had purchased Word from giant evil Microsoft or not.

 

A lot of younger people today don't appreciate how much more evil Microsoft was in the early years.  The browser wars is a well-known example, but that wasn't an isolated event.  Microsoft tried to do this whenever they could, and document exchange was absolutely not an exception to their desire to be evil and monopolistically control everything that happened on computers.

 

The longer story of why PDFs exist (in the form that they exist) is perhaps more technically interesting.  Before PDFs, the preferred non-proprietary format was another open Adobe document format, PostScript.  Luke would really appreciate this as would any programmer, because PostScript was (and still is) FREAKING AWESOME.  PostScript, for those who don't know, was a standard on how to get graphical information from document software to a printer, but the really cool part is that it's a complete programming language, with powerful graphics and text capabilities built in.  It has loops and functions and complex data types, and works as an interpreted language.  All in reverse-polish notation, which is trippy, but really fun to program in.  Reverse polish notation means that it was a stack based language, where objects were pushed onto a stack until a command was pushed, and then the command would be executed, using up data from the stack as needed.  As a trivial example, consider this code fragment that draws a grey box with a black outline:

0 setgray
newpath         %start a new line
10 10 moveto    %move the current point to 10,10
100 10 lineto   %add a line segment from the current point to 100,10, and then make 100,10 the current point
100 100 lineto
10 100 lineto
closepath       %add a line segment (if needed) to make this path a closed shape
gsave           %save the current graphics context (i.e. the path we just drew, plus other things like current color)
0.5 setgray     %set color to halftone gray
fill            %fill the current path, and then dispose of that path
grestore        %put the last saved graphics context back (restore the path we just used up)
4 setlinewidth
stroke          %draw the current path in the current color and dispose of the current path

 

PostScript was great if you were a geek and wanted to write a program in 35 lines that would draw a seal, and then send that to the printer.  But most PostScript was produced by printer drivers and they were terrible, and would include in any PostScript output the entirety of the printing functions that the writers of that driver had created to "simplify" turning documents into printed output.  So "Hello World" written by me, by hand, would be four lines, but written by a printer driver could  be like a megabyte, which in those days was insanely large.

 

As cool as it is that PostScript is a complete printer language, in practice it produced overly large documents and they were slow to print (because the program had to be run, and computers (especially the ones they stuck inside printers) were not that fast back then.  And PostScript was also being used beyond printers as a generic document exhcnage format (again, because sending .doc files is evil, or at least it was then).  So PDF was created as a new approach to digital output from word processors that could be sent to printers, but also be exchanged among people without having to pay the Micro$oft tax, and be used more efficiently to display documents without printing them.

 

PDF does still include a postscript-like graphical language inside that's used for all scalable graphics, and as the basis for the text capabilities for PDF.  But PDF is also a container format, that lets you include image files in a variety of formats.  It specifies all things as objects, describes what objects are on what pages, and where they are on the page, and how they're arranged.  It's also indexed, so that printers and readers can process it much more efficiently and (for readers in particular) jump around to different pages without having to reprocess the entire document again, as one would need to do for PostScript files.  (There were actually some standardized PostScript formatting conventions that also allowed for this by adding structure to the PostScript documents in comments, and it was widely used, but kind of a hack).

 

PDF added the (optional) ability to be searchable by specifying the actual textflow and rough location of all the text in a document — because without this, text was only described graphically, not necessarily in order, not necessarily even as whole words.  Later versions of PDF made it possible to begin displaying the PDF while it was still loading (because in early versions the object index came last, so you couldn't display anything until you had a complete file.

 

So yeah.  it isn't editable**.  It was never supposed to be editable.  It was supposed to be a way for people to look at "printouts" without printing them, to send them back and forth intact, and to not have to pay Micro$oft for the privilege of doing any of this.

 

The idea that the recipient of a file wouldn't need to the same program that the sender used to create it is critically important in the history of software.  Think of it as "right to repair", but in this case it was "right to read documents without paying a vendor tax".

 

Hey you know what, this might make for an interesting TechQuickie.  Also, I encourage Luke to spend an afternoon learning some PostScript basics.  it's a very satisfying programming experience.  You'll need a PostScript viewer, and these days I'm not sure what supports that.

 

**Fun fact: you can load a PDF file into a text editor, search for JPEG objects within the file, trim everything above and below any set of JPEG data, save the result to a new file, and just view it as a JPG.  Because it is a JPG.  Because PDF is a container format, not unlike a zip file (credit to Dan for making that comparison).  But a container format crammed with document layout and presentation information.  I actually do this to get images out of PDFs sometimes (modern viewers sometimes allow you to just do this through the interface, and there's also command line programs to extract images, but hey I'm old school).  Anyway, the point here is that, for me, PDF files are "editable" because I can do this.

 

I'm editing this to add that an absolutely critical modern use of PDFs is in printing. They are now the de facto printer language, having almost entirely replaced PostScript (it's possible that some modern printers may not even be able to print PostScript files.)  Sure you can often directly send an image format like jpg or png and most printers can handle it.  But if you want to print a file that is going to be scaled perfectly to the resolution of the printer, it's basically guaranteed that it will be PDF.  If you want to be able to scale a page up to poster size and have it not look like ass, it has to be PDF.

Ya lost me when you went all Trump.  Not sure why the need to start off insulting people, but maybe not hold your supreme overlord so tightly?

 

"and most Windows losers [oops] "

 

Really?

"Do what makes the experience better" - in regards to PCs and Life itself.

 

Onyx AMD Ryzen 7 7800x3d / MSI 6900xt Gaming X Trio / Gigabyte B650 AORUS Pro AX / G. Skill Flare X5 6000CL36 32GB / Samsung 980 1TB x3 / Super Flower Leadex V Platinum Pro 850 / EK-AIO 360 Basic / Fractal Design North XL (black mesh) / AOC AGON 35" 3440x1440 100Hz / Mackie CR5BT / Corsair Virtuoso SE / Cherry MX Board 3.0 / Logitech G502

 

7800X3D - PBO -30 all cores, 4.90GHz all core, 5.05GHz single core, 18286 C23 multi, 1779 C23 single

 

Emma : i9 9900K @5.1Ghz - Gigabyte AORUS 1080Ti - Gigabyte AORUS Z370 Gaming 5 - G. Skill Ripjaws V 32GB 3200CL16 - 750 EVO 512GB + 2x 860 EVO 1TB (RAID0) - EVGA SuperNova 650 P2 - Thermaltake Water 3.0 Ultimate 360mm - Fractal Design Define R6 - TP-Link AC1900 PCIe Wifi

 

Raven: AMD Ryzen 5 5600x3d - ASRock B550M Pro4 - G. Skill Ripjaws V 16GB 3200Mhz - XFX Radeon RX6650XT - Samsung 980 1TB + Crucial MX500 1TB - TP-Link AC600 USB Wifi - Gigabyte GP-P450B PSU -  Cooler Master MasterBox Q300L -  Samsung 27" 1080p

 

Plex : AMD Ryzen 5 5600 - Gigabyte B550M AORUS Elite AX - G. Skill Ripjaws V 16GB 2400Mhz - MSI 1050Ti 4GB - Crucial P3 Plus 500GB + WD Red NAS 4TBx2 - TP-Link AC1200 PCIe Wifi - EVGA SuperNova 650 P2 - ASUS Prime AP201 - Spectre 24" 1080p

 

Steam Deck 512GB OLED

 

OnePlus: 

OnePlus 11 5G - 16GB RAM, 256GB NAND, Eternal Green

OnePlus Buds Pro 2 - Eternal Green

 

Other Tech:

- 2021 Volvo S60 Recharge T8 Polestar Engineered - 415hp/495tq 2.0L 4cyl. turbocharged, supercharged and electrified.

Lenovo 720S Touch 15.6" - i7 7700HQ, 16GB RAM 2400MHz, 512GB NVMe SSD, 1050Ti, 4K touchscreen

MSI GF62 15.6" - i7 7700HQ, 16GB RAM 2400 MHz, 256GB NVMe SSD + 1TB 7200rpm HDD, 1050Ti

- Ubiquiti Amplifi HD mesh wifi

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Dedayog said:

Ya lost me when you went all Trump.  Not sure why the need to start off insulting people, but maybe not hold your supreme overlord so tightly?

 

"and most Windows losers [oops] "

 

Really?

In my defense, this issue is central to what I'm talking about.  The casualness with which Windows users have always thoughtlessly just emailed a .doc file, as if it is some sort of standard, has been endlessly frustrating for the entire non-Windows world.  So that was an attempt to cover a lot of territory in a brief light-hearted way.  Ultimately, sending a proprietary format document to someone who can't read it is a mix of cluelessness and hubris on the part of Windows users.

 

Of course, I'm also a long time sufferer of Linux Superiority Syndrome.  But I have the worst form of it — I've had it so long that in my case it's actually Unix Superiority Syndrome.  There is no known cure, and in fact nobody who has either of these syndromes wants a cure.

 

Thirdly, as a sysadmin, there's a very long history of carrying the attitude generally that "Users are lusers".  I have literally seen this sign hung, hidden from customer view, in real actual sysadmin offices.  In my case I do fight for a cure on this issue, as the reality is that "users are lusers the reason you have a job".  Nevertheless it carries subconscious influence over me, and makes it easier for me to refer to any user as a loser.

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, Kisai said:

What's the alternative? sending proprietary MS word docs? Sending LibreOffice junk? Stuff that doesn't open in each other's software?

I believe the best alternative would be a new open standard that is similar to PDF in concept but under the hood is better technically, and easier for many vendors to implement support for. Part of this alternative would be attempting to garner up adoption such that most programs or systems would support it. I can't say this is a realistic alternative though, but I'd like it if someone attempted it.

 

Given the current status quo I agree that PDFs are the best option.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, smcoakley said:

I believe the best alternative would be a new open standard that is similar to PDF in concept but under the hood is better technically, and easier for many vendors to implement support for. Part of this alternative would be attempting to garner up adoption such that most programs or systems would support it. I can't say this is a realistic alternative though, but I'd like it if someone attempted it.

 

Given the current status quo I agree that PDFs are the best option.

standards.png

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 2/21/2024 at 4:56 AM, Thomas A. Fine said:

In my defense, this issue is central to what I'm talking about.  The casualness with which Windows users have always thoughtlessly just emailed a .doc file, as if it is some sort of standard, has been endlessly frustrating for the entire non-Windows world.  So that was an attempt to cover a lot of territory in a brief light-hearted way.  Ultimately, sending a proprietary format document to someone who can't read it is a mix of cluelessness and hubris on the part of Windows users.

 

Of course, I'm also a long time sufferer of Linux Superiority Syndrome.  But I have the worst form of it — I've had it so long that in my case it's actually Unix Superiority Syndrome.  There is no known cure, and in fact nobody who has either of these syndromes wants a cure.

 

Thirdly, as a sysadmin, there's a very long history of carrying the attitude generally that "Users are lusers".  I have literally seen this sign hung, hidden from customer view, in real actual sysadmin offices.  In my case I do fight for a cure on this issue, as the reality is that "users are lusers the reason you have a job".  Nevertheless it carries subconscious influence over me, and makes it easier for me to refer to any user as a loser.

Thomas, your Stallman is showing! I'm going to go out on a limb and suggest you had massive karma and a four digit user id on Slashdot. Do you read a lot of BOFH? It's crazy to me the gap between the Microsoft Borg/SCO days and now. 

 

Oh god I'm old.

Link to comment
Share on other sites

Link to post
Share on other sites

As someone that had to use a low level library to generate fancy looking signed PDFs, hooo boy that was a headache. But also the first time in forever I actually got to do some graphics stuff professionally instead of me writing game engines as hobbies. And once I wrote the generation library, it was very clean and easy to use to generate PDF/A files of a specific style. But if someone had ideas about style changes, they would have to redo a lot of fiddly bits. Or just fork out the money for a HTML-> PDF library. But I did not get that because 1 month of my time was cheaper than the library. (it is not, by a very far margin). 

 

Anywhoo... PDFs have a place, and I greatly prefer them as a way to send documents to people. Especially recruiters and marketers that might get ideas about putting some extra stuff in the consultant profile if they get an editable document. 

 

And lest we forget, there is TIFF as well, that can be used as a multiple page document format. 

Link to comment
Share on other sites

Link to post
Share on other sites

On 2/22/2024 at 9:35 PM, fartmuncher69_420 said:

Thomas, your Stallman is showing! I'm going to go out on a limb and suggest you had massive karma and a four digit user id on Slashdot. Do you read a lot of BOFH? It's crazy to me the gap between the Microsoft Borg/SCO days and now. 

 

Oh god I'm old.

I broke into his account once upon a time.

 

And yes, I've read some BOFH.  Would actually make a great animated series, wouldn't it?

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×