In June 2020, the genetic sequences of 241 virus samples taken in Wuhan at the start of the outbreak disappeared from an online scientific database . By analyzing files stored on Google Cloud, Jesse Bloom, a virologist from the University of Seattle, points out that he was able to recover thirteen of them.
These genetic sequences had disappeared from an online database called the Sequence Read Archive, maintained by the National Institutes of Health (NIH). Seattle University virologist Dr. Bloom explains that he noticed their absence when analyzing a spreadsheet available in a study published in May 2020 in the journal PeerJ. During this work, the authors had cataloged 241 SARS-CoV-2 genetic sequences up to the end of March 2020 as part of a project by Wuhan University.
This famous spreadsheet indicated that the authors of the study had uploaded their sequences to the database. However, by typing this information in the search bar, the searcher did not find it.
During his investigative work, the researcher then pointed out that these 241 deleted sequences had been collected by Dr. Aisu Fu, of Renmin Hospital in Wuhan .
While browsing the scientific literature, Bloom then came across a "pre-print version" of another study conducted in March 2020 by this same researcher. Published three months later in the journal Small, this study focused on specific virus mutations isolated from 45 nasal swabs taken from outpatients in Wuhan (no area specified) at the start of the outbreak (no date specified). .
That said, for Dr. Bloom, these samples were probably the source of the famous 241 missing sequences. He then realized that the Sequence Read Archive was backing up information on Google Cloud. Thanks to this data, he explains that he finally was able to recover thirteen of the 241 sequences deleted at the start .
This new study, which has not yet been peer-reviewed, naturally raises questions about why these original footage was removed. For Jesse Bloom, the researchers had "no scientific reason to do so ". It therefore seems probable according to him that these sequences were "deleted to hide their existence “.
This data could prove invaluable in efforts to trace the origin of the virus . In order to determine it, researchers must indeed isolate the "first virus" from which all the other strains originated. So far, the first sequences had been mainly sampled from cases registered in December 2019 in the Huanan market in Wuhan, long considered the starting point of the epidemic. However, the first genetic sequences isolated from this market included three mutations absent from certain virus sequences sampled weeks later outside the market. Here, the sequences found by Jesse Bloom were also devoid of these mutations .
In the Times, the researcher points out that this new announcement neither reinforces nor dismisses the hypothesis that SARS-C-V-2 may have leaked from a P4 laboratory. Incidentally, viruses lacking these three mutations more closely match the coronaviruses found in horseshoe bats, which many suspect are the source of the pandemic.
On the other hand, it reinforces earlier suggestions that this virus may have been circulating in Wuhan before the first official declarations in December 2019 (since those in Wuhan had already mutated). The use of the conditional is justified insofar as it is not known precisely where and when the samples linked to these recovered genetic sequences were collected. However, this information is obviously crucial for tracing the virus to its origin.