Chapter 5: Content Types and Introduction to Data Modeling
Content types define the structure of your site. Before you dive into creating content types, it’s important to carefully think through the data model for your site. While there is no single perfect data model for a given site, some data models are more effective than others in supporting the kinds of browsing, searching, and visualization interfaces that you want to create. This chapter describes content types, the two content types that are configured by default on Backdrop sites, and some factors to think about when creating your data model. It concludes by applying these considerations to the example site.
On a Backdrop site, "content types" are how you store most of the site content, from small pieces of data that will never be shown to the end user in isolation (e.g. the name and geographic coordinates of a location that you want to refer to from another content type) to more complex entities with a lot of different properties (e.g. a project profile, including information about the developers, the funding agency, a description, website URL, links to relevant publications, etc.) to simple content that stands alone for visitors to the site (e.g. basic pages like "about the project", or blog posts). Creating content types in Backdrop is a straightforward process; making good choices about how to break down the material you want to present on the site into content types, and how to structure those content types takes some careful deliberation.
One consequence of Backdrop's extreme flexibility and customizability is that a Backdrop site can easily become a convoluted mess— where it's not clear which modules, blocks, or views are generating any particular snippet of text on the screen, and it's not clear how one would add new data to the site, or why content is showing up in a particular order— in a way that's simply not possible with WordPress or Omeka. The choices made around content types are generally the primary culprit when a Backdrop site descends into chaos, as the site developer adds more and more ad-hoc fixes to get the data to show up the ways it needs to.
If a project team includes both technically-oriented members and scholarly-oriented members who generally work separately, planning the content types is one occasion that necessitates a high degree of collaboration and communication— in person, if possible. If a developer makes assumptions about whether to make a date field mandatory, or whether a person’s profile should have separate fields for given names and surnames, it can impact the kinds of scholarly arguments one can make with the data. Likewise, if a scholar doesn't make explicit their assumptions about what they'll be able to do with the data (e.g. "display all events that happened in the 19th century, and filter by person involved"), the content types may get structured in a way that doesn't support it. Restructuring content types can be time-consuming, even more so if data entry has already begun. Content types that are actively used will inevitably evolve to some extent, but investing work in getting the content types reasonably “right” for your project upfront will pay off considerably.
If you’ve installed Backdrop using the standard installation profile, your site will already have two content types: “Page” and “Post”, which are equivalent to the WordPress data structures with the same names. Both will have a title field and a body field; “Post” additionally has fields for an image, and for tags.
If all the information that you want to capture about the content in your collection could be captured by those two default content types, that is a strong indicator that Backdrop is not the right platform for your project. The payoff for the additional configuration work that Backdrop requires is its ability to easily capture dates, connections between different types of data, locations, etc. in a structured form that can be leveraged in a variety of ways for display and navigation. If this is not necessary for your project, WordPress is the better platform choice.
There is nothing special about either of the default content types. They’re not protected in any way: you can delete the Post content type, for instance, by going to Structure > Content types > Post > Configure and clicking the “Delete Content Type” button at the bottom. You can also add fields to them like any other content type.
For our example project, we could potentially just use the default content types without making any changes. We could use “Page” to store biographies of individual people, putting their name in the title field, and all the other information about them in the body field. “Post” could be used primarily to store images using the image field, with space for a title and description in the title and body fields, respectively, and a few descriptive tags.
Using the default content types in this way might be problematic for your project, even if the title/body field structure is generally a good fit. You might want to have an “about” page that provides an overview of your project, but if you’re also using the “Page” content type to store information about people, you’ll encounter some difficulties when using Views to generate a display of all people. Informational pages will be mixed in with the biographies (since both use the “Page” content type), and there’ll be no way for Backdrop to differentiate the two kinds of content, even though you can do so easily as a human. For this reason, it’s best to use the default content types as intended— for generally static information (“Page”) and blog posts, news, updates and the like (“Post”)— and create additional content types tailored specifically to your data, even if some of them look similar to “Page” or “Post”, using the title and body fields.
In the case of our example project, storing the data using just a title and body field would drastically limit what we could do with it. There would be no way to connect an image of a person to the biography of that person. There would be no way to generate a map or timeline view of the events in the person’s life, because that information would just be stored as part of a big text field. There would be no way to sort people alphabetically by last name (unless we indicated that names must be entered into the title field, last name first). Finding all people affiliated with Howard University would be no easier than if we were working on paper.
Not all data needs to be stored in a structured way: for every project, it’s important to consider what kinds of research questions, navigation options, and visualizations your site will support, and how much additional data entry work would be required for each new aspect of the data you’re considering encoding. Even the small amount of time it takes to put information about a person’s birth date into a specific date field can add up over tens or hundreds of individuals. Multiply this by the number of specific fields you wish to include in a content type, and it quickly becomes evident why a project team should carefully identify which fields are truly important for the research questions or displays they intend to support, and which would fail to provide a payoff commensurate with the data entry work required.
We’ve concluded that for our example site we need more -- and more elaborate -- content types than Backdrop provides by default. But how can you determine how many content types you actually need, or what fields they should have?
For most projects and data sets, there is no single correct data model: it all depends on what you plan to do with the data, and how you anticipate the project might evolve. Some degree of guesswork is needed, and you'll likely adjust your content types over time, but thinking through the following considerations before deciding on your initial set of content types should reduce the number of changes you’ll have to make later.
For sites that will be storing data, a rule of thumb is to construct one content type for each kind of data, and use fields within those content types for metadata (information about the data.) Two projects based on the same data set may have different perspectives of what counts as data, and what counts as metadata, depending on how they want to present the content.
Suppose your data set includes the following:
- Full text of an interview, which includes references to important places
- Name of the interviewee
- Age of the interviewee
- Birthplace of the interviewee
How many content types should you create for this data? It all depends on the project goals and focus, and how you can see the project expanding.
If the focus is primarily on the text of the interview, perhaps one content type ("Interview") is enough. It can contain some fields to store the information about the interviewee. (The choice of which kinds of fields is a more complicated one, and is discussed further below.)
If the people being interviewed are themselves a potential focus-- of equal importance to the interview text-- you might want two content types: "Interview" and "Person". "Person" would contain the information about the person (name, age, birth place), and "Interview" would contain the text of the interview, and a node reference field pointing to the person being interviewed.
Technically, not much would change if you included the node reference field as part of the "Person" content type instead, and used it to point to the interview, but doing it that way feels a little less intuitive. There's something incomplete about an "Interview" content type that doesn't store information about who's being interviewed (even though you can use Backdrop’s Views to call up that information, if it's stored as part of the "Person" content type), but information about interviews is not an essential part of a stand-alone "Person" content type.
What if the focus of the project has more to do with locations? In addition to "Interview" (and "Person", if you're splitting that into its own content type), you would want to create a "Location" content type. You would then use a node reference field to store the person's birth place, and you might add a (possibly multi-valued) field to "Interview" to capture the important place or places mentioned in the interview text.
The taxonomy system in Backdrop has some characteristics in common with content types. The simplest use of taxonomies involves creating different vocabularies, and either pre-populating them with a set of terms (which you may choose to organize hierarchically) or using them to store user-generated tags. By default, terms in any vocabulary have a name (the term itself) and optionally a description, but you can add fields on a vocabulary-by-vocabulary basis, which will then be available to all terms in that vocabulary (go to Structure > Taxonomy > Your-taxonomy-name > Manage fields to see the interface, which is nearly identical to the "manage fields" interface for content types).
The ability to add specialized fields to vocabularies can be useful. A project that invites users to list their institutional affiliation as part of their user profile could use a free-tagging a vocabulary for institutions in the user profile, and have a project assistant augment every term added with the geographic coordinates of the institution, in order to generate a user map.
On the other hand, it does blur the line between content type and taxonomy, which complicates the process of data modeling. As a rule of thumb, if you're considering adding more than two fields to a vocabulary, think about why you're not creating a content type for it. If you're still torn between content type and taxonomy, here's a few more factors to consider:
- The content of a node reference field (which you can use to point to content stored in a content type) appears as a link to that single piece of content. For example, if you have a "Location" content type, and use a node reference field in a "Person" content type for the person's birth place, clicking on the person's birth place will take you to the corresponding “Location” node.
- The content of a term reference field (which you use to associate a taxonomy term with a piece of content) appears as a link to a page that primarily shows a list of all the content that's been assigned that taxonomy term. The definition, and/or any other fields you've associated with the vocabulary, appear at the top. Backdrop also automatically creates an RSS feed for content tagged with each taxonomy term. Therefore, if you use a vocabulary to store locations, clicking on the person's birth place will give you a list of all people with that birthplace, and any other content that's been tagged with the same location, such as interviews where that location is important.
- Editing taxonomy terms (or adding them, outside the context of adding a new tag when creating content) requires a different set of user permissions than creating/editing content via content types. Backdrop doesn't automatically have the fine-grained permission control for taxonomies that it does for content. Where Backdrop differentiates "edit own content in Content Type X", "edit any content in Content Type X", "delete own content in Content Type X" and "delete any content in Content Type X", there's an all-purpose "Edit terms in Vocabulary X" and "Delete terms in Vocabulary X". Depending on who's going to be entering the data, you might want the additional control offered by a content type.
- Views behaves in surprising, and generally undesirable, ways if you try to display data from a field stored in a taxonomy term as part of a view of node data. For instance, if you have a node reference field in an “Interview” content type pointing to a “Person” node, and you only want to display the person’s last name (which exists separate field as part of the “Person” content type) as part of a display listing all interviews and interviewees, you can easily substitute the last name field for the person’s first name. If you have people stored as taxonomy terms, with separate fields for first and last name as part of the taxonomy term, trying to make that substitution will generate duplicate listings for any interview with more than one person associated with it.
- You can use Views to create a taxonomy-like display of all content that refers to certain kinds of nodes, using a node reference field. You can also use Views to create an RSS feed for all content associated with those nodes. Essentially, there's nothing unique about the way taxonomy content is presented by default-- you just have to do a little more work to get it.
In short, if there are factors that make it preferable to create a content type rather than an elaborate taxonomy vocabulary, but you prefer the way taxonomies display, go with the content type-- it's not terribly difficult to recreate a taxonomy-like display using Views.
What if you've looked at your data, broken it down into content types and fields, and discovered that some of the content types will have identical, or nearly-identical, sets of fields? An example might be content types for information about undergraduate students and graduate students: perhaps the "Grad Student" content type has an extra field for dissertation topic, but is otherwise identical in form to the "Undergrad" content type. You could set them up as two different content types, but that means doubling the configuration work, both now and in the future. It's extremely likely that, at some point, you'll need to add another field to the "Grad Student" or "Undergrad" content type, and if it's one that applies to both, you'll have to remember to update it in two places. If these content types are truly so similar, and you anticipate that they'll generally remain so, you might want to consider creating a single "Student" content type, and include a "Text (list)" field for selecting whether the student is a graduate student or undergrad. To accommodate a small amount of variation, you can use the Conditional Fields module (in development, beta code available here: https://github.com/backdrop-contrib/conditional_fields) to have the dissertation topic field show up only after "graduate student" has been selected.
On the other hand, your site might evolve in a different direction, providing more (and more detailed) information about grad students than undergrads. How likely is it that, for example, you might want to provide a "short bio" field for your grad students (but not undergrads), or include the courses your grad students are teaching or TA-ing. Is the similarity between the "Grad Student" content type and the "Undergrad" content type more connected to the role that both play within the overall data set of your project, or is it because that aspect of your project (how much data to include, and/or what to do with it) isn't fully worked out yet? Moving data from one content type to another will be a pain regardless of whether you're splitting a content type, or merging two into one. At a certain point, you have to make your best guess about how your project will develop, and move on.
Usability is another factor to keep in mind here. "Undergrad" vs "Grad student" are easy enough to understand as options in a drop-down list as part of a "Student" content type. But what if your site stores the full text of 19th century poems, as well as the full text of user-contributed essays about those poems? Maybe in both cases, the only fields you have are the out-of-the-box title and body fields. Should you have one content type (where the user chooses "poem" or "essay" from a text list field), or two? In this case, chances are good that the users of your site think about the poems and essays as very different things, particularly if you're anticipating that scholars with a connection to this material will be contributing it. Creating a single content type might be easier for you to configure, but at the cost of making the site feel less intuitive to your users. In such a situation, it's often better to create two content types, especially if there's not a lot of field configuration work that would need to be done twice.
5.4.4 User profiles vs. content types
If the data on your site is primarily about people, you face a choice between storing that information in user profiles (see section 10.4) and using a content type. The determining factor here is whether the people in question will themselves be logging into the site and editing their information. If people expect to be able to edit information about themselves, it’s easiest to store it as part of the user profile. Each user automatically has permission to edit their own profile information; all you have to do is create a user account, and they can edit their information as soon as they log in. If you use a content type in a situation where people expect to be able to edit their own information, you have to create a user account so the person can log in, then create a node for the person, and set the person’s user account as the author of the node. Then, you either need to provide the user with the URL of the node you created with their information, or create a block using Views that will display nodes where the user is listed as the author, and put it somewhere visible. (See chapter 11 for more on blocks, and chapters 12 and 13 for more on Views.)
If the data on your site is primarily about historical people, or people who aren’t part of the community of site users, using a content type to store information makes the most sense. It becomes more complicated if the people who make up the site’s data are a mix of users and non-users. If a significant number of people are non-users, it may be better to use a content type, even though it requires more work to set up each user who is also a person in the database. Setting up user accounts that you don’t expect will ever be accessed is less than ideal from a security standpoint, and since you have to use a unique email address for every account on the site, generating a large number of unused email addresses that you control can quickly become cumbersome.
There are other Backdrop components (both modules and core functionality) whose settings are based around content types (i.e. different content types can have different setting configurations, but all content of a single type-- like undergrads and grad students within a "Student" content type-- must have the same configuration.) Some of these include:
- Default publishing settings: on a content-type-by-content-type basis, you can define whether content should automatically be published (appearing live on the site), saved as a draft, or scheduled for later publication.
- Permissions: the Backdrop permission system breaks down permissions by content type (add content type X, edit/delete own content type X, edit/delete all content type X). Using the example above, if you want some users to be able to create "undergrad" content, but not "grad student" content, that's a moderately strong factor in favor of different content types.
- URL alias patterns: if you want the automatically-generated URLs for two pieces of content to be radically different (e.g. "mysite.university.edu/2019/01/06/content-title" vs "mysite.university.edu/content-title"), that's one of the strongest factors in favor of different content types. Note that if you have just one example, or a handful of examples, of a piece of content you want to behave differently than the way you've configured the URL alias pattern for its content type, you can always manually change the URL.
- Hiding/displaying author and date information: if you want some content to display "Submitted by [user name] on [date]" at the top, and other content to not display it, it's a factor in favor of different content types. If you want to customize that text, though, you may need to use Views anyway, in which case you can set up more specific conditions than just content type for when that text shows up.
- Comments: if you want some content to have comments enabled by default, and other content to not have comments by default, that's a factor in favor of different content types. Note that users with the right permissions-- "Administer comments and comment settings" -- can turn on comments for any new or existing piece of content, regardless of what the defaults are.
While content types can vary in the number and nature of their fields, every single content type must have the default “title” field. Every node (an instantiation of a content type) must have a Backdrop title. There are ways of hiding it from display, there are ways of automatically generating it, but there is no way to circumvent the fact that every node must have a title as Backdrop understands it.
By default, the Backdrop title is displayed at the top of the page when a user is viewing the node. In addition, if you have a content type that uses a node reference field (a field used for pointing to a different node), you'll have to use the Backdrop title of the node you want to reference. (There are ways around this by setting up a View that displays some other information from the nodes to be referenced, but using the Backdrop title-- either via a select list or autocomplete, is by far the simplest.) Particularly if your site uses node reference fields, it's important to make good use of the Backdrop title, and this might require not thinking of it primarily as a "title".
Depending on the nature of the content type you're creating, the default Backdrop title field might be a good fit for capturing real-world titular data. If you have a "Project" content type, you can use the Backdrop title field as-is to store the project title. Leaving the Backdrop title field alone is also a good idea for the default “Page” and “Post” content types (or their equivalents), since web pages and blog posts tend to have titles.
What do you do with a content type that doesn't naturally have a "title" in the same way that a blog post or a project does (e.g. a dictionary word), or where the real-world "title" associated with the content type shouldn't be given the same prominence as some other piece of information (e.g. a content type for a person, where their name should appear most prominently, not their professional title)? Out of the box, Backdrop allows you to give the "title" field a different label on the node creation page (part of the "submission form settings" when you're editing a content type or creating a new content type, see below). For these two cases, you could change the label of the Backdrop title field to "Word" and "Name", respectively. Somewhat confusingly, in the second case, you might want to add a field that is labeled "title" to the person content type, to store the person's professional title.
Even though the Backdrop title field will be relabeled "Name", its machine name (how it's stored in the database-- which is never displayed to the end user and only appears on a few administrative pages) will still be "title". If you add a field to store the person's professional title, that field will be labeled "Title" but its machine name will be "field_title". Keeping an eye on the machine name, and whether it’s prefixed by “field_” will be important in a few situations where you’ll use the machine name as a placeholder for data stored in that field, such as when configuring URL alias patterns (see section 6.9).
What if there's no single piece of data in the content type that's a good fit for the Backdrop title? As an example, imagine a content type for storing brief weekly updates a project. Backdrop will automatically capture the username of the person creating the update, and the date and time it was created. The default "body" field can be used to store the actual text of the update, but what of the title? Something like "1/1/16 Update (Jane)" might make sense, but if you use the Backdrop title field, the people working on the project would have to enter that information manually-- in spite of the fact that Backdrop is already capturing two of the three bits of information (the date and the username of the person adding the update). You could include instructions about your conventions for the titles of project updates, but that means extra work for your project assistants that's more likely to generate inconsistencies than actually be useful. A better way to generate useful Backdrop titles in such cases, without increasing the human work involved, is to use the Automatic Nodetitles module, described in section 6.8.
The example site described in section 1.5 will provide the context for most of the configuration work in this book. Let’s take a closer look at the data and the goals of the site.
The data for this project consists of the following:
- Biographies of African Americans in medical professions, and specific events from their lives, including temporal and spatial information about those events. These biographies are stored in a single Word file.
- Images of African Americans in medical professions, including (but not limited to) photos of people whose biographies will also be stored on the site. The images are sourced from the Flickr Commons (https://www.flickr.com/commons) collection of public domain images.
Example 1: Biography
Alexander Thomas Augusta (March 8, 1825 – December 21, 1890) was a surgeon, professor of medicine, and veteran of the American Civil War. After gaining his medical education in Toronto, he set up a practice there. He returned to the United States shortly before the start of the American Civil War. In 1863, he was commissioned as major and the United States Army's first African-American physician and also the first black hospital administrator in U.S. history. He left the army in 1866 at the rank of Brevet Lieutenant Colonel.
- Alexander Thomas Augusta was born, March 8th 1825 , Norfolk, Virginia
- Applied to study medicine at the University of Pennsylvania, was rejected, 1845 , University of Pennsylvania, Philadelphia, PA
- Moved to California to earn money for medical school, 1846 , California, USA
- Married Mary O. Burgoin, January 12th 1847
- Enrolled at Trinity College at the University of Toronto, 1850 , University of Toronto, Toronto, Ontario
- Received degree in medicine from the University of Toronto, 1856 , University of Toronto, Toronto, Ontario
- Left Canada for the West Indies, 1860 , West Indies
- Given a Presidential commission in the Union Army, October 1862 , Washington, DC, USA
- Received a major's commission as surgeon for African-American troops. This made him the United States Army's first African-American physician (of a total of eight) and its highest-ranking African-American officer at the time., April 4th 1863 , Washington, DC, USA
- Assaulted; three people were arrested, May 1863, Baltimore, Maryland
- Commissioned Regimental Surgeon of the Seventh U.S. Colored Troops, October 2nd 1863, Washington, DC, USA
- Wrote to Judge Advocate Captain C. W. Clippington about discrimination against African-American passengers on the streetcars of Washington, D.C.: "Sir: I have the honor to report that I have been obstructed in getting to the court this morning by the conductor of car No. 32, of the Fourteenth Street line of the city railway. I started from my lodgings to go to the hospital I formerly had charge of to get some notes of the case I was to give evidence in, and hailed the car at the corner of Fourteenth and I streets. It was stopped for me and when I attempted to enter the conductor pulled me back, and informed me that I must ride on the front with the driver as it was against the rules for colored persons to ride inside. I told him, I would not ride on the front, and he said I should not ride at all. He then ejected me from the platform, and at the same time gave orders to the driver to go on. I have therefore been compelled to walk the distance in the mud and rain, and have also been delayed in my attendance upon the court. I therefore most respectfully request that the offender may be arrested and brought to punishment.", February 1st 1864 , Washington, DC, USA
- Awarded a brevet promotion to Lieutenant Colonel, March 1865 , Washington, DC, USA
- Accepted an assignment with the Freedmen's Bureau, heading the agency's Lincoln Hospital in Savannah, Georgia., October 1866 , Savannah, GA
- Left military service at the rank of Lieutenant Colonel, October 13th 1866 , Washington, DC, USA
- Began teaching anatomy at Howard University, November 8th 1868 , Howard University, Washington, D.C.
- Returned to private practice in Washington, D.C., 1869 , Washington, DC, USA
- Received honorary MD from Howard University, 1869 , Howard University, Washington, D.C.
- Attending surgeon to the Smallpox Hospital in Washington, D.C., 1870 , Washington, DC, USA
- Received honorary AM from Howard University, 1871 , Howard University, Washington, D.C.
- Stopped teaching anatomy at Howard University, July 1877 , Howard University, Washington, D.C.
- Alexander Thomas Augusta died, December 21st 1890 , Washington, DC, USA