ARIM, ARIM Publisher, eCTD, eCTD 4, XML

eCTD 4: Unlocking Keywords, Part 2 (wonkish)

In the most recent post on eCTD 4 I explained the basics of how metadata that classifies documents in the electronic Common Technical Document (eCTD) is organized, and how ACUTA’s ARIM Publishing Module will need to evolve to support it.

This post is going to get into the specifics of what must go on behind the scenes to support that in the eCTD 4 XML, which is based on the Health Level 7 (HL7) Regulated Product Submissions (RPS) standard. A fair warning: This is going to get technical, and a couple XML samples will be displayed in their gory details.

The first thing to note is that keywords have two parts in eCTD 4:

  • The definition of a keyword which is ‘owned’ by the application and can be referred to in any later sequence
  • The reference of a keyword by its code in the Context of Use (the rough equivalent of the eCTD <leaf> element)

The keyword references within the table of contents (Context of Use) entries are part of the message that is shared by all applications bundled together (the submission unit) – and the keyword definitions and documents themselves are ‘owned’ by the individual applications. Having the separate keyword definition lets the sponsor reuse the codes in subsequent sequences, and best of all, change the term’s description if it was entered incorrectly, or, for instance, if the trade name changes during negotiations. Compare this to the current eCTD, where changing the metadata requires marking all the leaf elements using that term as deleted, then re-adding them with the new term.

The keyword definition itself is pretty complex. There are actually five levels of coding that make up a single definition, but only the last two are used in the reference in the Context of Use

  • The code set – what kind of code it is, such as products or manufacturers. ICH unwisely named these ich_keyword_type_3 for the manufacturers, and ich_keyword_type_4 for the products – would it have killed them to call them “ich_keyword_product” etc.?
  • The owner of the code set – who defined the code set. This will be an OID (Object Identifier – a subject for another time) indicating who controls the list of code sets, either ICH (International Council for Harmonisation), or the agency such as FDA. For eCTD 4, FDA has only defined additional keyword types for forms, for promotional material type and promotional material audience type (but the latter will never be used in a keyword definition, see below).
  • The display name – what a human being would read, e.g. “Hungrilapse” or “Stuffmaker PR” in the example in last week’s post.
  • The code – what the sponsor defines as the code to be used in the keyword references. The examples by the agencies usually prefix the code with a clue as to what kind of code it is, which will become evident shortly. So, the code might be “PRD-HLPS” for the product, and “MFR-SMKPR” for the manufacturer.
  • The owner of the code – who defined this particular code. In the definitions in an eCTD 4 message, this will always be an OID that defines the sponsor/MAH’s keyword list. This is needed because the keywords predefined in controlled vocabularies will also have a code owner, again either ICH or the agency.

Here is an example of a keyword definition. The six blue text items are the code set, the owner of the code set, the status of the code (which can be set as “suspended” in later sequences), the code, the owner of the code, and the display name, respectively.

Not all keywords need to be defined though: only those which have free text defined by the sponsor/market authorisation holder. Other keyword types are just referenced by code and owner, as they are expected to exist for all applications as defined by ICH or the agency. This includes

  • Document Type (what was previously called the File Type in the Study Tagging File in the current eCTD)
  • Form Type (US FDA)
  • Promotional Material Type and Promotional Material Audience Type (US FDA)
  • Species, Duration, and Type of Control (also part of the Study Tagging File organization)

Using the keyword is just a matter of including it in a <keyword> element in the context of use, where you indicate the code, and the code owner. Although I hope nobody ever has to read the XML manually to debug it, this is where having the codes somewhat readable, and indicating the type of code they are comes in handy, since the code type isn’t part of the context of use, as shown below. This example uses three keywords, for the Product, Dose Form and Manufacturer.

One impact here is that keyword codes need to be unique within an application. Because they’re only referenced by code and owner, you can’t just code the three products “PBJ1”, ‘PBJ2” and “PBJ3” and then code the two substances “PBJ1” and “PBJ2” – eCTD 4 will see both “PBJ2”s as the same code. That’s another good reason to use a recognizable prefix for the code. The plan for ARIM is to create a code for you with a standard prefix, and let you edit it – also checking to ensure that it’s unique.

In a bundled submission, it’s also important to define the same codes for each application, because the contextOfUse elements apply to all applications in the bundle. If a later submission applies to a different set of applications (or just one), the publishing and review systems need to know what metadata is available for that specific application at any given time.

Some technical notes:

  • The RPS standard supports defining keywords in a bunch – if you look at the definition example above, the standard allows multiple <item> elements. However, ICH has limited the use to one-at-a-time.
  • If the display name needs to change, the next submission only needs to re-define the same keyword with the same codes and code sets marking that the display name has changed, e.g.
  • There is a keyword type I neglected to mention in the previous post: Group Code. This is used to associate a group of documents as being a single document, such as the need to break a long summary into multiple files. This makes it clear that the set of context of use items should be reviewed together, as opposed to being separate files that share the same context of use code and keywords, such as multiple manufacturing process documents. Many, but not all, of the context of use codes support the use of the Group Code. Think of it as being similar to both the Study Tagging File ID, or a Node Extension in the current eCTD.
  • In the current eCTD, the Study ID and Study Title are separate items in the Study Tagging File. Because they are both associated with the same concept, ICH decided to require that they be placed in a single keyword definition of type “ich_keyword_type_8”, with the ID and title separated by an underscore. While this prevents accidental pairings of different studies, and slightly shortens the XML, it’s an odd thing to do.

While the XML is very different, except for the tasks mentioned last time there is very little that will change in your publishing processes, mainly the need to create unique codes in an application related to a changeable metadata term. As you prepare for FDA’s planned eCTD 4 pilot next year, contact ACUTA for more information on how ARIM will help you improve submission processes.


Photo credit Marco Verch

Author: Joel Finkle

Joel became embroiled in electronic submissions when regulatory came downstairs and asked "Can we convert all our clinical study reports to WordPerfect format for the FDA reviewer?" and he didn't say, "No." Since then, he's been involved with custom CANDAs, PDF publishing, eCTD, document template automation, Regulatory Information Management, HL7's RPS, and the ISO IDMP standard. He joined ACUTA in April of 2017. He'd share some of his famous tomatillo salsa with you, but he can't carry it on airplanes.

Leave a Reply

Your email address will not be published. Required fields are marked *