top of page

How virus variants get their confusing names—and how to make them better

APRIL 21, 2021

BY AMY MCKEEVER, National Geographic

This transmission electron microscope image shows SARS-CoV-2, the virus that causes COVID-19, isolated from a patient in the United States. Virus particles are shown emerging from the surface of cells cultured in the lab.

Coronavirus variant names are strange and complicated. Sure, B.1.1.7 or P.1 might be perfectly fine names when virologists and microbiologists need to keep track of them—but they’re not so useful for the public trying to make sense of the variants driving new COVID-19 surges.

Take it from Salim Abdool Karim, an epidemiologist and former chair of South Africa’s COVID-19 advisory committee. He helped name the variant that was first discovered in the country: 501Y.V2, which, confusingly, is also known as B.1.351 and 20H/501Y.V2.

“Who wants to keep saying 501Y.V2?” Abdool Karim says. “501Y.V2 is such a mouthful to say. It’s a terrible name. You wouldn’t want to call your child 501Y.V2.”

Abdool Karim says it’s understandable that so many people have instead begun referring to the virus as “the South African variant.” But he is also one of many scientists who have criticized this practice, arguing that it is both stigmatizing and just plain inaccurate.

Soon, that might change. The World Health Organization has convened a committee of virologists to come up with a new naming system designed to resolve these issues. But why is it necessary? Here’s a look at how viruses and their variants typically get their names, the chaotic ad hoc naming system that emerged during the pandemic, and the historic pitfalls of naming viruses after the place where they were identified.

Why names matter

Many viruses have been named for the geographic regions where they were first identified, such as the Zika Forest in Uganda or the Ebola River in the Democratic Republic of the Congo. But this has historically also been stigmatizing to the communities from which the viruses derive their names.

“We know from past outbreaks, epidemics, and naming scandals that these things can have a real impact because that might be the only thing someone knows about that country, that this bad thing is coming from there,” says Emma Hodcroft, a molecular epidemiologist at the University of Bern in Switzerland. “So there’s a real effort in the scientific community to try to avoid using geographical names.”

In 2015, the WHO even issued guidance for naming infectious diseases that discouraged using geographic locations, human names, or animal species. Last year, the body also deliberately avoided any reference to China or Wuhan when it named COVID-19, which stands for coronavirus disease 2019. (More on how SARS-CoV-2, the virus that causes COVID-19, got its name in a bit.)

But Alexandre “Sasha” White, assistant professor of the history of medicine and sociology at Johns Hopkins University, points out that this hasn’t stopped anti-Asian sentiment from rising in the last year—with some help from prominent figures like former United States President Donald Trump, who insisted on referring to SARS-CoV-2 as the “China virus” or “Wuhan virus.”

“I have no doubt that the associations between COVID-19 and China and the stigma around that has been unfortunately critical to the rise in anti-Asian hate crime around the world,” he says. This is not exactly a new phenomenon. The spread of infectious disease has been a powerful force for justifying racism and xenophobia for centuries.

But there’s also a scientific argument for staying away from geographical names: Scientists point out that the names are misleading at best and totally inaccurate at worst.

The truth is that scientists don’t know where the so-called South African variant actually originated. Sure, the variant was first identified in South Africa, but researchers haven’t yet found patient zero. It’s possible that South Africa was just the first country to find the variant because it was doing more genetic sequencing than other countries.

Abdool Karim also says the label is misleading because the variant has spread throughout the world and is now more prevalent in places like the United States than it is in South Africa. “So you can see how crazy it is to call it the South African variant,” he says.

There are real consequences of using an inaccurate name, such as the U.S. ban on travel from South Africa, Brazil, and the United Kingdom earlier this year. The effects can also be long-lasting. It’s been more than a century since the 1918 influenza pandemic devastated the globe and, even though the first cases were recorded in the U.S., Hodcroft points out that many people still believe it originated in Spain because it became widely known as the Spanish flu.

How a virus gets its name

Although the WHO is responsible for naming diseases, viruses are named by a group of virologists and phylogeneticists that serve on the International Committee on Taxonomy of Viruses (ICTV).

In February 2020, the ICTV re-christened what was then called the 2019 novel coronavirus as SARS-CoV-2, which stands for severe acute respiratory syndrome coronavirus 2. Stanley Perlman, a microbiologist at the University of Iowa and a member of the ICTV study group for coronaviruses, says the group chose the new name because the virus’s genetic make-up was “clearly close” to the one that caused the SARS outbreak in 2003, which is called SARS-CoV.

But given all the pathogens in the world, the ICTV only names viruses at the species level and higher. So the process for naming variants begins much more informally among scientists—and will vary from pathogen to pathogen, says Hodcroft.

“There’s no rulebook for how you name your pathogen,” she says. Scientists essentially come up with a name and see if it gets adopted by the scientific community or if another name takes root instead.

One typical way to classify a virus is by its antigens—a piece of the virus that provokes an immune response and whose mutations are particularly important.

Influenza A, for example, has two prominent antigens, known as H (which stands for hemagglutinin) and N (which stands for neuraminidase). Every time those antigens mutate, they get assigned a new number—hence the name H1N1 for the most infamous pandemic influenza subtype. The virus has 18 different H mutations and 11 different N mutations that can be mixed-and-matched to form 198 potential combinations—although only 131 subtypes have been identified in nature.

“All these viruses mutate all the time so we can’t be calling everything new names,” Abdool Karim says. “It’s only when they change an antigen that’s meaningful that we give it a new name.”

SARS-CoV-2, the virus that causes COVID-19, is mutating particularly rapidly and in so many ways both benign and dangerous—which Perlman says requires “a really intricate system of naming.” The trouble is that scientists have essentially had to do that on the fly—and have come up with several different systems, each with a different use.

The chaos of SARS-CoV-2 variants

In November 2020, researchers in South Africa sequenced a new and more transmissible SARS-CoV-2 variant, which included an N501Y mutation that allowed the spike protein to bind more tightly to human cells. This mutation replaces the asparagine (N) amino acid, typically found at position 501 of the spike protein, with tyrosine (Y). But before they could announce it to the public, the researchers first needed to figure out a name.

“We just sat down over a cup of tea and called it 501Y.V2,” Abdool Karim says. The first part of the name represents the most meaningful mutation of the virus, while V2 simply signifies that it is the second variant identified with that particular mutation. (The variant that was discovered in the U.K. is 501Y.V1 and the variant discovered in Brazil is 501Y.V3.)

But that’s not the variant’s only name. Several naming systems have arisen since the beginning of the pandemic—the two most prominent being Nextstrain and Pango. Although having more than one variant classification system might seem like overkill, these offer scientists different ways of analyzing the SARS-CoV-2 family tree.

Hodcroft says that the Nextstrain system, which she helped develop, is intended for scientists who want to look at the broader patterns on the virus’s family tree by assigning names to major genetic groupings, or clades, of the virus. It uses simple names that are based on the year the clade was identified, followed by a letter that’s assigned in alphabetical order. The root clade in the system is 19A, representing the viruses that were prevalent in China at the beginning of the outbreak.

However, Hodcroft says the limitations of the Nextstrain naming system became apparent when variants like 501Y.V2 began to drive regional outbreaks. Although they were technically not yet widespread enough to merit their own clade, she says these variants clearly needed to be identifiable. As a result, in this system, the variant of concern identified in South Africa is now named 20H/501Y.V2.

“It’s just because there’s no system for this,” Abdool Karim says. “It’s made as we go along. As we learn more, we change it."

Pango, meanwhile, takes a fine-grain approach to the SARS-CoV-2 family tree and has become the most commonly used system since it’s useful for tracking local outbreaks.

There are hundreds of lineages in this system, which is designed to reflect how the virus has evolved amid each new outbreak. It assigns new lineages not just based on significant mutations, but also includes other epidemiological events, such as if the virus jumped from one location to another.

“The fundamental principle is that the lineage names represent ancestry and descent,” says Oliver Pybus, an evolutionary biologist at the University of Oxford who helped design Pango.

Pybus says that every Pango lineage can be read essentially as a family tree. The earliest viruses that first circulated in China are denoted as lineages A or B. As they evolved and spread across the globe, their descendants are marked by a series of numbers. For example, B.1 includes the outbreak in northern Italy in early 2020 and is the first descendant of the B lineage to be named. Meanwhile the variant of concern identified in South Africa, named B.1.351, is the 351st descendant of the virus that caused that Italian outbreak.

To keep these names from becoming too unwieldy, each Pango lineage can only have up to three dots in it. If the virus changes significantly after that, a new lineage begins under a different letter of the alphabet. That’s why the variant that was first identified in Brazil is called P.2 even though it is a descendant of the B.1.1.28 lineage.

Still confused? That’s because these naming systems are designed not to be easy to recall but to give scientists a common language in which they can discuss and investigate the evolution of SARS-CoV-2.

“As scientists we’re pretty used to these kinds of complicated names,” Hodcroft says. “We love to divide things up and name them.”

Virus variants typically don’t make national news. But now that some of these variants are driving the pandemic and dominating headlines, Hodcroft says there needs to be a way for non-scientists to keep track of them, too—and, ideally, not by using their geographic names.

Developing a new naming system

For all of these reasons, the WHO is stepping in to develop yet another naming system for the most worrisome virus variants. Although scientists will go on using their naming systems like Nextstrain and Pango—which Pybus says will also be announcing some organizational changes in the weeks ahead—the new WHO system is expected to make it easier for the public to keep track of the virus mutations that are threatening their communities.

Scientists have suggested treating the variants like tropical storms—creating a bank of names like Irene and Hugo to assign to each new variant. Hodcroft says the WHO could also potentially take a similar approach to how drug companies come up with product brand names, putting together two random syllables to create easy-to-pronounce names like Zoloft or Advil that are otherwise meaningless in any language. This system ensures that brand names are distinct and can be used globally.

Abdool Karim, who has seen the WHO’s new system, says he expects it to be unveiled shortly and confirms that it’s a departure from the current practice of using a jumble of letters and numbers. “I thought it was quite good,” he says.

Once the new system is unveiled, the challenge will be to get the public to actually take it up in place of the geographic variant names. Hodcroft says this is where the WHO can play a critical role in virus naming: If the body can bring a group of virologists together and get them to agree to use these names whenever they speak to the public, there’s a much better chance that the scientific community and the rest of the world will adopt the new system.

Either way, Abdool Karim says scientists have learned an important lesson for the inevitable next pandemic. “We’re learning that we need to have in place a name system early,” Abdool Karim says. “I think we’ll be proactive next time.”

Source: here

This National Geographic article cuts through the confusion of multiple names, jumbles of letters and numbers to name the many new variants that have emerged during the COVID-19 pandemic. Are you confused by P.1, B1.1.7, 501Y.V1, 501Y.V2, 501Y.V3, B1.351, or B1.1.28? I am. Or should we stigmatise a country or region where a new variant has been identified just because that country was first to sequence a newly discovered variant which is part of the natural evolutionary process of viral mutation. Hence, why we have the UK, South African, Brazilian, two Califiornian and now the double mutant Indian variants.

In naming the SARS-CoV-2 virus, COVID-19, rather than the China or Wuhan virus, the scientists wished to avoid the stigmatization of the initial identification of the virus to China. Historically, this can be seen in the Spanish flu pandemic of the early 1920’s where the first cases were actually identified in the USA but the influenza virus was thought to have originated in Spain, forever stigmatizing that nation with the associated 50-100 million global deaths that ensued over a 3-4 year period.

Virologists and epidemiologists currently use two naming systems for new viruses and their variants: Nextstrain and the Pango systems. They are designed to follow the lineage and family tree as the virus mutates. This article takes you through how these naming systems evolved, what they mean, and the pros and cons of each system.

There is hope on the horizon for the non-scientists of the global public, where a more easily understood and non-regional naming system is about to be unveiled as the WHO steps in to fix the confusion and reduce the misinformation surrounding COVID-19 variants and the impact of the currently available vaccines and those in development.

- Doctor Donald Greig


bottom of page