You are on page 1of 8

Policies on Release of Human Genomic Sequence Data

Bermuda-Quality Sequence

Summary of Principles Agreed at the First International Strategy Meeting on Human Genome Sequencing
(Bermuda, 25-28 February 1996) as reported by HUGO The following principles were endorsed by all participants. These included officers from, and scientists supported by, the Wellcome Trust, the UK Medical Research Council, the NIH NCHGR (National Center for Human Genome Research) , the DOE (U.S. Department of Energy), the German Human Genome Programme, the European Commission, HUGO (Human Genome Organisation) and the Human Genome Project of Japan. It was noted that some centres may find it difficult to implement these principles because of legal constraints and it was, therefore, important that funding agencies were urged to foster these policies. Primary Genomic Sequence Should be in the Public Domain It was agreed that all human genomic sequence information, generated by centres funded for large-scale human sequencing, should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society. Primary Genomic Sequence Should be Rapidly Released

Sequence assemblies should be released as soon as possible; in some centres, assemblies of greater than 1 Kb would be released automatically on a daily basis. Finished annotated sequence should be submitted immediately to the public databases.

It was agreed that these principles should apply for all human genomic sequence generated by large-scale sequencing centres, funded for the public good, in order to prevent such centres establishing a privileged position in the exploitation and control of human sequence information. Coordination In order to promote coordination of activities, it was agreed that large-scale sequencing centres should inform HUGO of their intention to sequence particular regions of the genome. HUGO would present this information on their World Wide Web page and direct users to the Web pages of individual centres for more detailed information regarding the current status of sequencing in specific regions. This mechanism should enable centres to declare their intentions in a general framework while also allowing more detailed interrogation at the local level.

Summary of the Report of the Second International Strategy Meeting on Human Genome Sequencing
(Bermuda, 27th February - 2nd March, 1997) as reported by HUGO

Summary

The principles enunciated at the first International Strategy meeting, of rapid data release and public access to the primary genomic sequence, were reaffirmed. Scientists and funding agencies should take the necessary steps to ensure that the principles are adhered to by all participating organisations.

Sequence Quality Standards The following standards were agreed: The nucleotide error rate should be 1 error in 10,000 bases or less for most sequence. Assemblies should be verified by restriction digest using two or more restriction enzymes. Gaps in sequence. The agreed long term goal is no gaps, recognising that this is not yet routine. Closing gaps is the responsibility of the original sequencer. The following proposals were endorsed by the participants: It was agreed that a useful trial to assess sequence accuracy would be to perform a data exchange exercise. Raw sequence data would be exchanged among sequencing centres, centres would reassemble the data and identify outright discrepancies or ambiguities with reference to the sequence submitted to the database. These would be resolved by further consultation or resequencing. The same data sets would be sent to two centres which would hopefully engender competition to detect errors. All sequence reads should be archived in a retrievable form. Sequencing centres should define explicitly how error rates and costs have been calculated. Sequence Submission and Annotation Sequence data should be classified simply as "finished" or "unfinished" and should be stored in distinct databases; consideration should be given to establishing a public database for unfinished sequence data. Sequence annotation should be standardised if possible, and include the following information:

Error estimation such as PHRED AND PHRAP data. Enzymes used to verify assemblies, and sizes of fragments produced. Exact details on how to assemble adjacent clones, with a minimum of 100 bp of overlapping (preferably unique) sequence between clones for verification. Gaps must be sized and the surrounding sequence oriented and ordered. The methods used for sizing, and reasons for not closing the gap should be stated. If features such as coding sequence and splice sites are included in the annotation, it should be stated if they were identified experimentally or by computer predictions. Unfinished sequence; it should be stated how near the sequence is to completion.

Potential development of a database listing all gaps in 'finished' sequence. Sequence Claims and Etiquette Mapping investment does not automatically entitle sequencing claims over the same region until a sequence ready map has been generated. Potential conflicts with other sequencers to be resolved by early communication. Collaborations with groups with a biological interest in a region should be subject to the same principles of data release and communication. Investigate whether the Human Sequence Map Index should be relocated to be more closely associated with the other major human sequence databases. Claims allowed on the Index:

Duration - maximum 1 year. Size of region - minimum 1 Mb; regions to be defined by Gnthon markers if possible, other agreed and available markers if not. Maximum amount - in the order of three times the sequence released by the centre in the preceding year. Sequence claims must span the entire region between, and including, the delimiting markers.

NHGRI Policy for Release and Database Deposition of Sequence Data December 21, 2000
The National Human Genome Research Institute's (NHGRI) policy for release and deposition of DNA sequence data was devised to make sequence data available to the research community as soon as possible for free, unfettered use. To achieve this objective, NHGRI adopted as policy a practice that the sequencing community imposed on itself, that data were to be deposited in a public database within 24 hours of generating a sequence assembly of 2 kb or larger. Data release according to this practice is far more rapid than the standard scientific practice of releasing data only upon publication. In general, this practice has been enormously successful and has achieved its objective - the number of individual research projects that have used genomic sequence data generated by the public sequencing effort is already very large, even though a paper describing the entire genomic sequence has yet to be published. However, the policy now needs to be updated for two reasons - as originally stated, it does not address certain issues, and sequencing practices have advanced beyond the specific scope it addressed. Therefore, the NHGRI statement of policy for release and deposition of sequence data is being updated. A. Continued applicability of current policy. The current NHGRI policy on sequence data release (March 7, 1997) was developed early in the sequencing of the human genome and, as written, applies just to early stage data. Thus, the policy only addresses the release of the initial sequence assemblies, calling for the submission of sequence of

2kb or longer to GenBank within 24 hours of assembly. This current policy will remain in effect, but is extended as described under B. B. Extension of sequence data release policy. 1. Data generated during finishing of working draft sequence. During the upgrade of working draft to finished sequence, both additional shotgun data and "finishing reads" will be acquired and assembled with the working draft data to produce finished sequence. As the additional data are incorporated, the new assemblies will often contain only minor changes from the existing working draft. It does not necessarily make sense to require a new submission within 24 hours every time a new assembly is done. At the International Strategy Meeting on Human Genome Sequencing at Cold Spring Harbor in May 2000, the participants agreed that it would be sufficient for the policy to call for updating accessions within 24 hours of a significant change, with the decision as to what a "significant" change is to be left to the sequence producer. Some examples of significant changes include achievement of full shotgun coverage of a clone, definitive closure of a sequence gap with concomitant reduction in the number of contigs, and finishing the sequence. NHGRI concurs with this recommendation and adopts it as part of its Data Release policy. 2. Whole genome shotgun data. Increasingly, sequencing a large genome will involve a strategy that combines whole genome shotgun sequencing data with map-based framework data. In this approach, data producers will not start to assemble sequence until a significant amount of data has been collected -- this will likely be several months after data collection begins, and may be as much as a year later. However, the individual sequence reads or read pairs will be immediately useful to biologists for many purposes, e.g. in the annotation of the human sequence and in studying other genomes. Making these data publicly available prior to their incorporation into sequence assemblies would be consistent with the objectives of the NHGRI approach to data release. However, such very early release must also recognize the widely accepted ethic in the scientific community that those who generate the primary data freely should have both the right and responsibility to publish the work in a peer-reviewed journal. NHGRI believes that a reasonable approach is to recognize the opportunity and responsibility for sequence producers to publish the sequence assembly and large-scale analyses, while not restricting the opportunities of other scientists to use the data freely as the basis for publication of all other analyses, e.g. of individual genes, gene families and other projects at a more limited scale. To date, in many cases, the sequencing laboratory that has produced the data involved in a particular analysis was acknowledged and was actually a collaborator on some projects. This is a reasonable practice and NHGRI encourages its continuation. In summary, to achieve a balance between the interests of the scientific community and those of the sequence producers, NHGRI adopts the following policy: Sequence trace data, and all ancillary information specified in a standard format provided by the database, should be released weekly into the NCBI Trace Repository. The information deposited will consist of the sequence trace and ancillary data. The submissions to the Trace Repository will carry the following notice:

"As a public service to the biological research community, these data are being made available by the sequence producers before assembly and before scientific publication. Once deposited, but prior to the publication of the complete sequence of the relevant genome, the data are available to all as follows: 1. The data may be freely downloaded by all users, for use in all types of analyses (with the single exception described in item iv). 2. The data may be repackaged in other databases, provided that appropriate acknowledgement is given. 3. Users are free to use the data for publication in scientific papers analyzing particular genes and regions; the source of the DNA sequence data should be appropriately acknowledged. 4. The producing laboratories intend to publish the sequence of the genome and certain large-scale analyses of the sequence in a timely manner upon the completion of sequence data acquisition. Therefore, the sole exception to the unrestricted use of these unpublished data is that the data may not be used for the initial publication of the complete genome sequence assembly or other large-scale analyses. In this context, "large-scale" refers to regions the size of the whole genome or individual chromosomes and examples of "large-scale analyses" include identification of regions of evolutionary conservation across an entire genome and identification of complete sets of genomic features such as genes, repeat structures, GC content, etc. The producing laboratories will, however, be open to the possibility of collaboration on such assemblies or analyses." 5. Any redistribution of the data should carry this notice. Current NHGRI Policy for Release and Database Deposition of Sequence Data March 7, 1997 At the Second International Strategy Meeting on Human Genome Sequencing (Bermuda, 1997), attendees affirmed the principle that was set out at the First (1996) International Strategy meeting, that primary genomic sequence should be rapidly released. Specifically, the report of the first meeting stated that "sequence assemblies should be released as soon as possible; in some centres, assemblies of greater than 1 kb would be released automatically on a daily basis." The discussions at the 1997 meeting confirmed NHGRI's conclusions that it is extremely important for its large-scale sequencing program to be functioning in a manner consistent with this principle, that such rapid release is technically feasible, and that such unfinished DNA sequence data have already been found to be useful by the larger scientific community. NHGRI has determined, therefore, that its grantees engaged in large-scale genomic DNA sequencing should now be automatically releasing sequence assemblies of 2 kb or larger within 24 hours of their generation. (the trigger for data release is 2 kb, instead of 1 kb, in order to ensure that the released sequence be comprised of at least two sequence reads. Investigators who wish to release smaller assemblies may do so.) Any laboratory funded by NHGRI for large-scale human genomic sequencing must develop and submit to NHGRI a plan to implement such a data release program, which must be implemented within one month of its being approved by NHGRI. No non-competing or competing renewal will be funded until an acceptable plan has been approved. Mandatory data release as described above will be made a condition of the award for any grant funded by NHGRI for large-scale human sequencing.

Science 16 February 2001: Vol. 291. no. 5507, p. 1192 DOI: 10.1126/science.291.5507.1192

News Focus Bermuda Rules: Community Spirit, With Teeth


Eliot Marshall The "Bermuda Rules" may sound like standards for lawn tennis, but in fact they are guidelines for releasing human sequence data. Established in February 1996 at a Bermuda meeting of heads of the biggest labs in the publicly funded genome project, the rules instruct competitors in this cutthroat field to give away the fruits of their research for free. "The whole raison d'tre for the communal effort was to get useful tools into the hands of the scientific community as rapidly as possible," says Francis Collins, director of the U.S. National Human Genome Research Institute in Bethesda, Maryland. But the rules also offer another benefit: They discourage the patenting of genes by sequencing labs, an activity executives of big pharmaceutical companies seem to despise as much as some academics do. The insistence on quick, unconditional release of data also lies at the heart of the dispute between publicly funded genome scientists and the private company that has just produced a draft version of the human genome, Celera Genomics of Rockville, Maryland. Care to share? NIH's Francis Collins is a strong advocate of rapid data release.
CREDIT: RICK KOZAK

At the 1996 Bermuda gathering sponsored by the Wellcome Trust, a British charity that funds large-scale sequencing at the Sanger Centre in Hinxton, U.K., scientists agreed to two principles. First, they pledged to share the results of sequencing "as soon as possible," releasing all stretches of DNA longer than 1000 units. Second, they pledged to submit these data within 24 hours to the public database known as GenBank. The goal, according to a memo issued at the time, was to "prevent ... centers from establishing a privileged position in the exploitation and control of human sequence information." The Bermuda policy, which replaced a 1992 U.S. understanding that such data should be made public within 6 months, has had a significant impact on the field. For example, Collins claims, it has already enabled the identification of more than 30 disease genes. Both Collins and Ari Patrinos, director of the U.S. Department of Energy's office that funds genome research, backed the Bermuda push for openness. "We felt it would strengthen international cooperation," Patrinos says. "Scientists are by their very nature hoarders. They're chewing on the data all the time, and they never think they're ready" to let go, he adds. By adopting this formal mechanism, members of the consortium assured each other that no one would be

squirreling away caches of data or quietly patenting genes. The policy also delivered a clear symbolic message, Patrinos says: "We all believe that the genome belongs to everybody." When sequencers met in Bermuda again in 1997, they reaffirmed their pledge and added an explicit directive against patenting newly discovered DNA. Failure to cooperate, U.S. officials made clear, could be a black mark in future grant reviews. Although the message seemed to challenge private DNA databases by undermining their claims to exclusivity, large pharmaceutical firms welcomed it, because they would benefit if there were fewer patent holders to buy off. Alan Williamson, a former executive at Merck, the pharmaceutical giant in Whitehouse Station, New Jersey, embraced the policy enthusiastically. "Putting data out immediately was a good thing," he says, because it encouraged the sharing of research tools without letting legal contracts get in the way. But he wishes sponsors of this research had taken active steps to make it difficult for others to patent and sell this genetic information--for example, by filing their own noncommercial patent claims that might block other claimants. Biomedical companies, he argues, should compete on the commercially difficult work--developing drugs--not on profiting from research tools such as DNA databases. Indeed, Merck was so certain that this was the right approach that beginning in 1994, the company poured tens of millions of dollars into creating a nonprofit database of gene fragments known as expressed sequence tags (ESTs). The Merck Gene Index, as it is called, was designed to counter privately owned genetic databases and a surge in gene patenting led by such companies as Human Genome Sciences in Rockville, Maryland, and Incyte Pharmaceuticals in Palo Alto, California. These companies sell genetic information, patent uses for newly discovered genes, and seek to obtain royalties for the use of their patents--by big pharmaceutical firms and all other users. Merck also contributed to a free database of mouse ESTs, which are useful in identifying human disease genes. In a similar defensive move, 10 companies joined with the Wellcome Trust in 1999 to create a nonprofit database of human genetic variations garnered from the genome, known as singlenucleotide polymorphisms (SNPs). SNP maps may be extremely valuable someday in identifying disease genes and standardizing gene-based medical therapy, and several companies had already begun to gather them in private collections. Quarreling over the principles of the Bermuda Rules broke out again when Celera announced that it would sequence the entire human genome. Its business plan, according to president J. Craig Venter, is to collect and process genomic data more efficiently than research outfits can do for themselves. The company would appear to have no incentive to give information away, but Venter grabbed headlines in 1998 when he declared that he would finish a rough draft of the genome earlier than the publicly funded effort and give everyone free access to Celera's sequence. Ever since then, Venter and the advocates of the Bermuda Rules have been arguing about what "free access" means.

Competition between the Public and Private Sectors


Dr. Craig Venter, a scientist at the NIH, felt that private companies could sequence genomes faster than publicly funded laboratories. For this reason he founded a biotechnology company

called the Institute for Genomic Research (TIGR). In 1995 TIGR published the first completely sequenced genome, that of the bacterium Haemophilus influenzae. TIGR was soon joined by other biotechnology companies that competed directly with the publicly funded Human Genome Project. Among these other biotech firms is Celera Genomics, founded in 1998 by Venter in conjunction with the Perkin-Elmer Corporation, manufacturer of the world's fastest automatic DNA sequencers. Celera's goal was to privately sequence the human genome in direct competition with the public efforts supported by the NIH and DOE and the governments of several foreign countries. Using 300 Perkin-Elmer automatic DNA sequencers along with one of the world's most powerful supercomputers, Celera sequenced the genomes of several model organisms with remarkable speed and, in April 2000, announced that it had a preliminary sequence of the human genome. In order eventually to make a profit, these biotech companies were patenting DNA sequences and intended ultimately to charge clients, including researchers, for access to their databases. This issue of patenting had already caused controversy. Watson felt strongly that the sequence data flowing from the Human Genome Project should remain within the public domain, freely available to all. Meeting opposition to this view, he stepped down from his position as director of the NIH-sponsored project in 1992 and was succeeded by Francis Collins. Other researchers shared Watson's view, and in 1996 the international consortium of publicly funded laboratories agreed at a meeting in Bermuda to release all data to GenBank, a genome database maintained by NIH. The agreement reached by these scientists came to be known as "The Bermuda Principles," and it mandated that sequence data would be posted on the Internet within 24 hours of acquisition. Because the information is freely available to the public, the sequences can not be patented. The dispute between Celera Genomics and the International Human Genome Consortium continues, as scientists now begin the task of searching the genome for valuable information.

You might also like