Then and Now: Sex and Gender Representation in Technical Specifications
Conscious Culture is a Bolt initiative to balance execution with humanity and Bolt and so many other Conscious Culture companies are technology-driven companies. As a result, occasionally, we’ll bring you more technical discussions for your teams to discuss on how technology companies must apply the principles of a conscious company to strategic and tactical engineering decisions.
Technology is a product of humans and it is impacted by our societal norms, definitions, and behaviors. As part of Conscious Culture’s support of #BreakTheBias for women and underrepresented groups, we share this deeper tech dive into how our held assumptions and biases result in creating products and systems that exclude and invalidate people, ultimately impacting our lives.
Here we consider how much the cultural evolution of sex and gender affected implementations of technology. On this lightning tour of sex and gender representation in code, let’s go back to 1973 with the publishing of RFC 610 to see how these topics were discussed then through to present day. We’ll look at how updates to standards such as those published by the IETF and ISO attempt to rectify early exclusions and end with thoughts of what’s next.
The Beginning: Determining Data Validity in Computer Systems
To talk about sex and gender representation in computer systems, we need to start at the beginning, where RFC 610, Section 3.5, published in 1973, states the foundational concept here in the most obvious way possible:
“In order for the datacomputer system to insure data validity, the user must define what valid data is.”
What did the authors decide to use as an example of “valid” data? Something from their lives: their sex and gender assumptions.
“[Another] case is where some relation must hold between members of an aggregate. For example, if the sex component of a struct is ‘male’ then the number of pregnancies component must be 0.”
For now, the point is not to debate the meaning of maleness or pregnancy but rather to show that sex and gender assumptions were foundational to the way engineers discussed “validity” of data, and how they wanted to literally encode representations of the world around them.
That’s ultimately what is at the core of considerations for “ethical tech” and what “ethics in tech” really means: questioning assumptions about reality with the intent of decoupling otherwise tightly-coupled properties where these tight couplings are not in service to ethical ends.
ISO 5218: The Next Pass at Coding for Sex and Gender
Fast forward three years and the International Standards Organization (ISO) publishes their ISO 5218 standard, titled “Information technology—Codes for the representation of human sexes”. It’s 17 pages long, costs 88 Swiss Francs (about $1 USD today) to buy a copy of it, but, spoiler alert—it all basically boils down to this mapping republished on Wikipedia:
The four codes specified in ISO/IEC 5218 are:
- 0 = Not known;
- 1 = Male;
- 2 = Female;
- 9 = Not applicable.The standard specifies that its use may be referred to by the designator “SEX”.
The standard explicitly states that no significance is to be placed on the encoding of male as 1 and female as 2; the encoding merely reflects existing practice in the countries that initiated this standard.
This is not a ton of options and is a far coarser representation of biological sex variation in humans than exists in reality, but its storage requirements are tiny. In 1976, storage efficiency mattered a lot, and also in 1976, we did not have the awareness and vocabulary we do today. No acronym soup of LGBTQIA+ back then.
Following this introduction, not much of interest happened as ISO 5218 was the go-to standard for everything about sex and gender representation. Also note that these two concepts (sex and gender) were heavily conflated, which can be likened to what we now refer to as notions of “binary gender.” (We are over-simplifying here for brevity).
vCard Protocols and Email Drive the Next Update
Fast forward to the mid 1990s, when offices and businesses were finally starting to digitize and hopeful of implementing IBM’s long-standing dream of the “paperless office.” In this era, the concept of “PDI” or Personal Digital Interchange was formalized. Under the banner of the Internet Mail Consortium, which at the time was an industry association primarily promoting the use of email, protocol engineers designed what would later become two very important standard data formats that we use every day. These were called vCard, for electronic business cards, and vCalendar, for electronic calendaring and a format for exchanging scheduling information. These are the standards on which, for example, Google Contacts and Google Calendar are still based.
As an aside, Lisa M. Dusseault was one of the principal engineers of WebDAV, which is a superset of HTTP and is the protocol through which most vCards and vCalendar (later, iCalendar) information is exchanged. Dusseault worked at Microsoft at the time and is still a distinguished engineer.
What was curious about vCard at the time is that it had no way of recording sex or gender information at all. That was just omitted from the standard. Perhaps they were ahead of their time? But because people desperately wanted to record sex and gender information about humans, people used the convention of adding an X- prefix to non-standard property names and a new de-facto standard, the X-GENDER property on vCards, was born.
Implications of Limiting Assumptions
Meanwhile, in actual reality of course gender is a rich galaxy of options, and this is perhaps best shown by Yay! Genderform! which gives you a ton of options for describing yourself. As they say on their website:
“There are exactly 947 options here, and a total of 1.1896×10285 or 1.1 quattruornovemgintillion possible combinations, more than there are elementary particles in the universe. If each option were a computer bit, it would take 119 bytes to encode a combination.”
Contrast that with ISO 5218, which only requires two bits to encode because there are four mutually exclusive options, and you can get a sense of the difference in the scale of possibilities that different encoding schemes make possible.
Another example of how limiting assumptions like those encoded in ISO 5218 have ripple effects across computer systems and thus society was written up in a now-famous essay by Sam Hughes called “Gay Marriage: The Database Engineering Perspective,” also affectionately known as “the Gay2K problem.“ In this essay, Hughes takes us on a journey of creating fourteen increasingly inclusive possible database schemas for recording marriages between humans.
At first, assumptions are baked right into the table names, which are called simply males and females. This is how a lot of marriage databases have been created, because it reflects society’s general attitude towards people and thus marriages. In other words, it’s a tangible artifact of people asking, “But which one’s the wife?”
The essay ends by proposing a schema that not only accounts for same-sex marriages, but also any number of marriages between any number of people. That would account for what is now commonly called non-monogamous, polyamorous, or ENM relationships. Of course, as marriage in this context is a legal construction, this final database schema would be technically illegal if implemented today, and to make it legal a bunch of application-layer code would have to be written to make systems re-conform to the legal regime that governs what a “valid” marriage is in the eyes of the law. Put another way, we are technically already capable of accommodating far more inclusivity than our systems generally make realizable in practice.
Today’s Standards for Sex and Gender in Technical Specifications
In code, we know that “tightly coupling” certain assumptions to one another can lead to brittle and hard-to-maintain systems. It turns out, writing gender-inclusive code is a prime example of how to not only make computer systems more user friendly, but also more resilient. And, finally, the standards bodies that define the protocols and other tools we use are starting to catch on.
Through the 2000s and into the early 2010s we see more advocacy around this topic. Eventually, this work resulted in an update to the earlier vCard standards, vCard version 4.0, which seems one of the earliest places in which we see what has ultimately come to be understood as “best practice.” That best practice is two-fold. First, separating sex from gender and, second, allowing gender to be defined as a plain text field.
In section 6.2.7 of the IETF’s RFC 6350, we see examples in augmented Backus-Naur form (ABNF) notation for what the Gender property of a vCard object, which is supposed to represent a person’s “electronic business card,” can contain. We also see that the sex component now has six possible options, up from our original four defined in ISO 5218:
Sex component: A single letter. M stands for “male”, F stands
for “female”, O stands for “other”, N stands for “none or not
applicable”, U stands for “unknown”.
In February 2014, Facebook’s announcement that they would be adding a “Custom” option to their notoriously binary gender drop down menu broke this conversation out of the margins and into the mainstream.
As of this article, Facebook still actually asks for a binary gender assertion from its users, which they say is for use in advert targeting, because most advertisers still want information to be gathered in the old binaristic way despite its debatable efficacy for their purposes and that’s where Facebook makes its money. Facebook’s move was a nice facade on an otherwise still underlying binaristic system. Perhaps another discussion we should have is exploring advertising’s lack of progress in nuanced and hyper-personalized marketing.
Today, these conversations are ongoing and there are drafts like the “Feminism and protocols” document formally part of the IETF.
Ultimately, sex and gender representation in computer systems is a usefully illuminating look at how incorrect or outdated assumptions in data formats, codebases, database schema, and other computer systems can have massive second-order effects that create not just maintenance nightmares but, if not used carefully, can intentionally or unintentionally prevent society from progressing our own understanding of who we are or can be as people.