Anyone know if there is some paper, blog, whatever explaining why academic/scholarly journal articles continue to be ~99% published in PDF instead of HTML?

I'm not being snarky, I'm genuinely baffled and interested in knowing the answer.

@hugh I have no empirically-based answer but I think it might have to do with PDF ~feeling more scholarly~ or something? that PDFs are easier to save to desktop or reference managers for future use, or that people still like the look of PDFs as a surrogate for ye olde paper

@alissa @hugh

It seems more fixed. A lot of editors export to it directly without formatting changes. It's what Journals use because their older articles are actually scanned in.

@hugh Could it be a holdover from printing? I've heard PDF is preferred by most professional printers.

Could also be DRM 😷

@hugh possibly just hangover from when not every desktop computer had an internet connection and web browser installed?

@hugh No reference, but I vaguely recall something about pagination maybe being involved: if you aren't directly quoting someone (allowing a text search) but just paraphrasing a claim, or adding a 'see also', or something like that, then citations with page numbers direct readers to the relevant text. That can be solved with e.g. numbered paragraphs, but style guides are a possible source of inertia...

@GardenOfForkingPaths I have a LOT of anecdotal evidence supporting this, though I suspect they feed into each other (also some PDF articles have no page numbers!)

@hugh PDFs will look the same on any compliant reader. Any formulas, graphics, fonts etc. will display exactly how it's supplied.

HTML has no such guarantees.

@vi This is precisely the reason PDFs are a problem.

@vi ever tried reading a scholarly article PDF on a phone?

@hugh No reference, but given some experiences teaching faculty to use co-editing tools (wikis, etc.), I think PDFs represent "fixed" to many people. Non-fixed documents seem to make some people very, very nervous. And PDFs are fixed--pretty much what you see on one machine is what you see on another. NOT true for HTML.

Also, inertia. 😀

file formats Show more

file formats Show more

file formats Show more

file formats Show more

file formats Show more

@hugh this is 100% anecdotal, but ever since I started grad school, I went from hating PDFs to loving them. If I ever have the option to read a document as PDF or HTML (or something else) I always choose PDF now. There are a few reasons why:

1. PDFs fit into my workflow. I can easily download a bunch of documents to my Dropbox, organize them into folders, read them on my iPad, annotate them, and send them to people. There’s zero friction.

@hugh 2. Reading is PDFs is easy. On an iPad-size screen, it’s very similar to reading paper. This also breaks down to two similar points:

2a. When you’re reading for hours at a time, pagination is far superior to scrolling.

2b. When a journal database gives you HTML, there’s typically zero effort put into its readability. The font will be tiny, the lines will be too long. Images won’t display properly. It’s awful.

@hugh 3. Every PDF reader comes with annotation features. I can always mark up my documents, no matter what system I’m using. I can also send them to a colleague without ever worrying about if they’ll be able to read it.

Now, imagine trying to replicate one of these points with HTML. Even when it’s technically possible, there’s too much friction. Software is too inconsistent. The HTML ecosystem just isn’t built for the same things that the PDF ecosystem is.

More of a response to other responses:

Poor formatting of HTML articles provided by journals: this may change with increased demand for HTML articles, and it isn't necessarily a problem if a `Pocket/Readability/Instapaper for Academic Articles` emerged.

Annotation systems could be setup fairly easily, I suspect, for HTML content. Especially with a reader program. [Annotation of PDFs isn't super available on #Linux readers, I think.]

@bthall yeah, is working in the annotation space, and you're 100% right on formatting going where the demand is (i.e. it's a circular argument). The most compelling explanation for me was meta point about portability (I guess it's right there in the name!). The offline portability piece seems to be the real issue. I'm wondering if ePub or something like it might be part of the solution.

Yeah, I had ePub in mind when writing my post, too. I think that it would work fairly well, especially with a reconfigured ePub reader to fit articles rather than books and sync one's library across devices. HTML to ePub is fairly trivial, LaTeX is at least sometimes rendered in ePub readers, and SVG is as well for equations, graphs and diagrams (I don't know why I haven't heard of LaTeX to SVG being a thing)

@bthall the main thing holding ePub back is its association with books and DRM I suspect. If you called it a ‘bundled webpage’ or something it might be looked at differently.

@hugh I just found this, which seems to be a means of distributing ePub documents (books, articles):

It's supported by a number of apps. End users would need to configure their reader devices/apps for a given document provider's catalog, but among academics that might not be a big thing to ask.

Sign in to participate in the conversation
Aus GLAM Space

This is a Mastodon instance primarily for Australasian Galleries, Libraries, Archives, Museums and Records people, and anyone else who wants to hang out with them. Loosely associated with newCardigan