PDF in Electronic Submissions
Before there was eCTD (the electronic Common Technical Document), there were submissions using just PDF – the Portable Document Format created by Adobe (and before PDF, there were submissions using interactive PostScript, TIFF images, DAMOS, etc., but that’s a scary story for another time, perhaps around a campfire)
The PDF file format was something of a miracle: A device-independent way of representing the contents of a page, independent of Mac, Windows or Unix or individual printer requirements. There were competitors that haven’t survived (and Microsoft has had several of their own, most recently XPS that seemed to die out with Windows XP). Adobe published the specs openly, and almost immediately had competitors to their Acrobat product, and it became ISO standard 32000-1 in 2008.
A PDF file is essentially all the PostScript commands that would be used to draw text and graphics on each page, with some of the cool programming language features of PostScript trimmed out, and other features added for interactivity such as bookmarks and hyperlinks, and an index to quickly get to particular pages. By being an open standard, agencies could accept PDF files without playing favorites with software vendors (although Acrobat was the de-facto standard even though the Reader application was not originally free).
By 1999, FDA was accepting submissions using just PDF – a hyperlinked table of contents and each document in folders. This is familiar as the “NeES” format (Non-eCTD Electronic Submission) still in use in many parts of the world, although no longer permitted for many submission types in the US and Europe.
Acrobat Plug-Ins for Automation
Because of the open standard, and the programming interface published by Adobe for add-ins to Acrobat, both Market Authorisation Holders and vendors began building tools to simplify document assembly, create the bookmark and hyperlink navigation, and build the entire submission. I’d written a couple in a previous position at a pharma company that helped publishers build bookmarks with fewer mouse clicks and keystrokes. Adobe kept enhancing their PDF tools, nibbling away at the feature lists of the vendors’ packages, but three major factors made it worth using a commercial set of plug-ins to Acrobat for PDF publishing:
- The agencies had strict criteria for the documents they would accept, to ensure that they would be able to read and print those documents using the oldest computers on their desks, as well as continuing to work in the future. This included limiting the PDF versions that could be used, ensuring there was no password protection so that annotations could be made, limiting the typefaces (fonts) that could be used, and that no additional software from the sponsor, such as Acrobat plug-ins, can be needed for the review.
- The agencies required that any hyperlinks and bookmarks have a valid destination – you must be able to get to where the link is supposed to go – more on this below
- Software used for electronic submissions is subject to additional quality assurance and auditing, generally requiring validation that documents the correct function. The best known of the rules for that is FDA’s Code of Federal Regulations 21 part 11 (generally written as 21 CFR Part 11, or just Part 11). This limited which vendors were used for submission assembly, as it is rather costly to do such software validation.
Finding and Fixing Broken Links
The validity of hyperlinks and bookmarks is usually pretty simple: Anything that goes to a page within the same document is easily validated, and will stay working forever. It’s when links and bookmarks go outside of the current document that things can become hinky, because the destination can move, or change. Using “relative links” that aren’t anchored to a particular storage system is key, but it isn’t a 100% cure. Examples of failure conditions include updating a method validation with a new document that is linked to by several studies; or agencies rearranging how applications are stored relative to each other. Part of the problem is that these so-called “broken links” may arrive at a location (depending on the agency review system), but it could be out of date.
These problems are nothing new: Ted Nelson, who coined the word “hypertext” in 1963, designed Project Xanadu in 1960 as having two-way interconnection of sources and destinations of links, but it never became part of HTML (Hypertext Markup Language) or the internet’s backbone.
Detecting those problems is relatively simple, but computationally expensive: Every bookmark and hyperlink must be examined, and the link checked to see if its destination exists – and is still up to date. This is relatively easy within a single submission, and made more complex as things pile up, such as additional submission sequences, or other applications referring to the same. ACUTA’s APT (ACUTA PDF Tools) features including the Link Wizard, LinkInfo Wizard and especially the QC Wizard, help automate that validation process and reduce the time and effort of the document analysis. The QC Wizard will examine a group of documents analyzing the links and bookmarks as well as the page and font information for each document. It produces a report with color-coded indicators to help find and correct issues in documents. ACUTA’s ARIM Viewer includes similar tools for reviewing and repairing links between documents and helps ensure that submissions will work properly when they arrive at the agency.
Resolving broken links can mean re-submitting all the documents that link to a replaced document, and Japan’s health ministry insists that this is done, mitigated by the fact that there are typically fewer sequences in Japanese submissions, and no cross-application connections. FDA has suggested that cross-document links are limited, after all, it should be possible to browse to the same place via the application table of contents in the agency’s review system. On the other hand, there is a desire to provide as many aids to the agencies to speed review.
The Future of Linking?
The forthcoming eCTD version 4 is designed to encourage document re-use across applications, which may exacerbate this problem. The links between the table of contents and the documents is streamlined by using a Universally Unique ID (UUID) in the XML backbone. This does not help the hyperlinks, though. During the development of the RPS standard used for eCTD 4, I had suggested including features that would help with link destinations, but this was deemed too complex (perhaps for eCTD 5?). Even without that, there are ways to get to the right place even if the sources of links haven’t been updated, but they would require specialized software in either the server, or the desktop/browser tools used for the eCTD review. These would load the eCTD backbone and “notice” when a link would lead to a document that has been made obsolete — -easier in eCTD 4, but so far there are no proposals to do this.
The sheer number of bookmarks and hyperlinks in an eCTD or NeES submission means that there’s no practical way to ensure quality without an army of minions… and good tools. Contact ACUTA for information on how ACUTA PDF Tools and ARIM can help with your submissions.
Image Credit Stevie J Brown Photography