Guy Cochrane, European Nucleotide Archive, EMBL-European Bioinformatics Institute, Welcome Trust Genome Campus The databases of the International Nucleotide Sequence Database Collaboration (INSDC, DDBJ/EMBL/GenBank) and of the Trace and Short Read Archive collaborations between the NCBI and the EBI provide free and comprehensive access to nucleotide sequencing information, from raw sequencing...
» More
Guy Cochrane, European Nucleotide Archive, EMBL-European Bioinformatics Institute, Welcome Trust Genome Campus The databases of the International Nucleotide Sequence Database Collaboration (INSDC, DDBJ/EMBL/GenBank) and of the Trace and Short Read Archive collaborations between the NCBI and the EBI provide free and comprehensive access to nucleotide sequencing information, from raw sequencing machine output through to functional annotation. As well as offering a direct public portal into nucleotide sequence and annotation, the information embodied in the archives serves, through such projects as UniProt and Ensembl, as foundation for the world’s bioinformatics data infrastructure. Amongst the challenges of ever growing volumes of information is the need to provide sensible data organization to allow users the simple retrieval of, and computation upon, small and large data sets of interest in the main corpus. Key to this organization is the systematic capture and structuring of information relating to the biological source organism and molecule that have undergone sequence analysis. While long-established data structures exist for the representation of the more generic elements of this source information, recent community-focused initiatives have provided a number of alternative routes and structures for more specific information. In the talk, I will outline the services provided by the INSDC and Trace Archives, detail a number of paradigms for the representation of source organism and molecule information and will focus on the emerging strategy for in corporation of MIGS compliance data into INSDC records. The databases of the International Nucleotide Sequence Database Collaboration (INSDC, DDBJ/EMBL/GenBank) and of the Trace and Short Read Archive collaborations between the NCBI and the EBI provide free and comprehensive access to functional annotation, from raw sequencing machine output through to functional annotation. As well as offering a direct public portal into nucleotide sequence and annotation, the information embodied in the archives serves, through such projects as UniProt and Ensembl, as foundation for the world’s bioinformatics data infrastructure. Amongst the challenges of ever growing volumes of information is the need to provide sensible data organization to allow users the simple retrieval of, and computation upon, small and large data sets of interest in the main corpus. Key to this organization is the systematic capture and structuring of information relating to the biological source organism and molecule that have undergone sequence analysis. While long-established data structures exist for the representation of the more generic elements of this source information, recent community-focused initiatives have provided a number of alternative routes and structures for more specific information. In the talk, I will outline the services provided by the INSDC and Trace Archives, detail a number of paradigms for the representation of source organism and molecule information and will focus on the emerging strategy for in corporation of MIGS compliance data into INSDC records.
« Hide