Data Access Form

Dataset Title:	[Transcriptome statistics] - Transcriptome statistics from samples obtained on LMG1411 collected on the Gould (LMG1411) in the Western Antarctica Peninsula in 2014. (Polar Transcriptomes project) (Iron and Light Limitation in Ecologically Important Polar Diatoms: Comparative Transcriptomics and Development of Molecular Indicators)
Institution:	BCO-DMO (Dataset ID: bcodmo_dataset_665311)
Information:	Summary \| License \| ISO 19115 \| Metadata \| Background \| Files \| Make a graph

To work correctly, this web page requires that JavaScript be enabled in your browser. Please:
1) Enable JavaScript in your browser:
      • Chrome: "Settings : Advanced : Privacy and security : Site Settings : JavaScript"
      • Firefox: (it should be always on!)"
      • Opera: "Settings : Websites : JavaScript"
      • Safari: "Safari : Preferences : Security : Enable JavaScript"
2) Reload this web page.

Variable	Minimum	Maximum
species (unitless)	"Actinocyclus actin..."	"Thalassiosira anta..."
raw_sequence_reads (count)	681141	2071629
contigs_num (count)	6029	44909
isogroups_num (count)	4784	42346
transcriptome_size (Megabase)	2.2	22.9
mean_contig_length (base pair)	338	687
max_contig_length (base pair)	5810	8191
min_contig_length (base pair)	200	224
N50 (unitless)	315	935
contiguity (unitless)	0.07	0.25
BUSCO_pcnt (percent)	7	56
spliceosome_pcnt (percent)	21	87
ribosome_pcnt (percent)	60	81
KEGG (count)	"2158 [0.22]"	"5226 [0.12]"

Server-side Functions

distinct()

Hover here to see a list of options. Click on an option to select it.

File type: (more information)

(Documentation / Bypass this form )

(Please be patient. It may take a while to get the data.)

The Dataset Attribute Structure (.das) for this Dataset

Attributes {
 s {
  species {
    String bcodmo_name "species";
    String description "Species analyzed";
    String long_name "Species";
    String units "unitless";
  }
  raw_sequence_reads {
    Int32 _FillValue 2147483647;
    Int32 actual_range 681141, 2071629;
    String bcodmo_name "unknown";
    String description "Total number of raw sequence reads per species";
    String long_name "Raw Sequence Reads";
    String units "count";
  }
  contigs_num {
    Int32 _FillValue 2147483647;
    Int32 actual_range 6029, 44909;
    String bcodmo_name "unknown";
    String description "Number of contigs per species.";
    String long_name "Contigs Num";
    String units "count";
  }
  isogroups_num {
    Int32 _FillValue 2147483647;
    Int32 actual_range 4784, 42346;
    String bcodmo_name "unknown";
    String description "Number of isogroups per species.";
    String long_name "Isogroups Num";
    String units "count";
  }
  transcriptome_size {
    Float32 _FillValue NaN;
    Float32 actual_range 2.2, 22.9;
    String bcodmo_name "unknown";
    String description "Transcriptome size by species.";
    String long_name "Transcriptome Size";
    String units "Megabase";
  }
  mean_contig_length {
    Int16 _FillValue 32767;
    Int16 actual_range 338, 687;
    String bcodmo_name "length";
    String description "Average contig length by species.";
    String long_name "Mean Contig Length";
    String units "base pair";
  }
  max_contig_length {
    Int16 _FillValue 32767;
    Int16 actual_range 5810, 8191;
    String bcodmo_name "length";
    String description "Maximum contig length by species.";
    String long_name "Max Contig Length";
    String units "base pair";
  }
  min_contig_length {
    Int16 _FillValue 32767;
    Int16 actual_range 200, 224;
    String bcodmo_name "length";
    String description "Minimum contig length by species.";
    String long_name "Min Contig Length";
    String units "base pair";
  }
  N50 {
    Int16 _FillValue 32767;
    Int16 actual_range 315, 935;
    String bcodmo_name "length";
    String description "N50 value; N50 length is defined as the shortest sequence length at 50% of the genome";
    String long_name "N50";
    String units "unitless";
  }
  contiguity {
    Float32 _FillValue NaN;
    Float32 actual_range 0.07, 0.25;
    String bcodmo_name "unknown";
    String description "Contiguity threshold 0.75";
    String long_name "Contiguity";
    String units "unitless";
  }
  BUSCO_pcnt {
    Byte _FillValue 127;
    String _Unsigned "false";
    Byte actual_range 7, 56;
    String bcodmo_name "unknown";
    String description "Completeness of genome based on 429 core eukaryotic genes";
    String long_name "BUSCO Pcnt";
    String units "percent";
  }
  spliceosome_pcnt {
    Byte _FillValue 127;
    String _Unsigned "false";
    Byte actual_range 21, 87;
    String bcodmo_name "unknown";
    String description "Spliceosome KAAS pathway completeness";
    String long_name "Spliceosome Pcnt";
    String units "percent";
  }
  ribosome_pcnt {
    Byte _FillValue 127;
    String _Unsigned "false";
    Byte actual_range 60, 81;
    String bcodmo_name "unknown";
    String description "Ribosome KAAS pathway completeness";
    String long_name "Ribosome Pcnt";
    String units "percent";
  }
  KEGG {
    String bcodmo_name "unknown";
    String description "KEGG value; Functionally annotated contigs";
    String long_name "KEGG";
    String units "count";
  }
 }
  NC_GLOBAL {
    String access_formats ".htmlTable,.csv,.json,.mat,.nc,.tsv";
    String acquisition_description 
"Nine species of diatoms were isolated from the Western Antarctic Peninsula
along the PalmerLTER sampling grid in 2013 and 2014. Isolations were performed
using an Olympus CKX41 inverted microscope by single cell isolation with a
micropipette (Anderson 2005). Diatom species were identified by morphological
characterization and 18S rRNA gene (rDNA) sequencing. DNA was extracted with
the DNeasy Plant Mini Kit according to the manufacturer\\u2019s protocols
(Qiagen). Amplification of the nuclear 18S rDNA region was achieved with
standard PCR protocols using eukaryotic-specific, universal 18S forward and
reverse primers. Primer sequences were obtained from Medlin et al. (1982). The
length of the region amplified is approximately 1800 base pairs (bp
).\\u00a0Pseudo-nitzschia\\u00a0species are often difficult to identify by their
18S rDNA sequence, therefore, additional support of the taxonomic
identification of\\u00a0P.\\u00a0subcurvata\\u00a0was provided through sequencing
of the 18S-ITS1-5.8S regions. Amplification of this region was performed with
the 18SF-euk and 5.8SR_euk primers of Hubbard et al. (2008). PCR products were
purified using either QIAquick PCR Purification Kit (Qiagen) or ExoSAP-IT
(Affymetrix) and sequenced by Sanger DNA sequencing (Genewiz). Sequences were
edited using Geneious Pro software
([http://www.geneious.com](\\\\\"http://www.geneious.com\\\\\"), Kearse et al.,
2012) and BLASTn sequence homology searches were performed against the NCBI
nucleotide non-redundant (nr) database to determine species with a cutoff
identity of 98%.
 
Diatom phylogenetic analysis was performed with Geneious Pro and included 71
additional diatom 18S rDNA sequences from publically available genomes and
transcriptomes, including those in the MMETSP database. Diatom sequences were
trimmed to the same length and aligned with MUSCLE (Edgar 2004). A
phylogenetic tree was created in Mega with the Maximum-likelihood method of
tree reconstruction, the Jukes-Cantor genetic distance model (Jukes and Cantor
1969), and 100 bootstrap replicates.
 
Illumina TruSeq adapters and poly-A tails were trimmed from raw reads using
the Fastx_toolkit clipper function. Fastq_quality_filter was used to remove
poor quality sequences, such that remaining sequences had a minimum quality
score of 20 with a minimum of 80% of bases within a\\u00a0read\\u00a0meeting
this quality score requirement. Any remaining raw sequences less than 50 base
pairs in length were also removed. Merged files were assembled\\u00a0de
novo\\u00a0using Trinity (Grabherr et al. 2011). The resulting assembly was
filtered to remove contigs less than 200 bp in length. Trinity-assembled
contigs which exhibited sequence overlap were grouped into isogroups which
were then used for sequence homology searches (BLASTx E-value \\u2264 10-4)
against the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases (Kanehisa
2006).
 
BUSCO (Benchmarking Universal Single-Copy Orthologs) was used to assess the
completeness of genomes and transcriptomes based on sets of\\u00a0single
copy\\u00a0orthologous groups derived from OrthoDB that are highly conserved
within multiple lineages (Felipe et al. 2015). Completed, duplicated and
fragmented orthologs were determined by meeting an \\u2018expected score\\u2019
and having aligned sequences within two standard deviations of the BUSCO
gene\\u2019s length.\\u00a0A second\\u00a0metric of completeness was performed by
evaluating conserved pathways, such as the ribosome and spliceosome, using the
single-directional\\u00a0best-hit\\u00a0method in the KEGG Automatic Annotation
Server (KAAS) (Moriya et al. 2007).\\u00a0Finally\\u00a0contiguity,\\u00a0was
calculated at the 0.75 level as according to Martin and Wang (2011) with
custom scripts.
 
For each transcriptome, unassembled sequence reads were aligned to the final
Trinity assembly using Bowtie 2 (Langmead 2012). Mapped reads were normalized
by the Reads per Kilobase per Million reads method (RPKM) (Mortazavi et al.
2008).
 
Gene biogeographical distributions -\\u00a020 genes of interest were selected
in the study to investigate the molecular basis of iron and light limitation
in polar diatoms. Reference sequences for each of these genes were obtained
from the\\u00a0F.\\u00a0cylindrus\\u00a0and\\u00a0P.\\u00a0tricornutum\\u00a0JGI
genome portals
and\\u00a0T.\\u00a0pseudonana\\u00a0and\\u00a0T.\\u00a0oceanica\\u00a0NCBI and
GenBank repositories. Reference sequences were identified in the
transcriptomes by translated nucleotide homology searches (tBLASTn) with an
e-value cutoff of <10-5. A reciprocal tBLASTn homology search was performed
for each transcriptome against the KEGG GENES database, using the single-
directional\\u00a0best-hit\\u00a0method in the KAAS online tool to ensure
consistent gene annotations (Moriya et al. 2007).
 
Subsequently, reference sequences were identified in the MMETSP protein
database by BLASTp (e-value <10-5) homology searches among the diatom
transcriptomes. The transcriptomes and their associated latitude and longitude
were obtained from iMicrobe Data Commons (Project Code CAM_P_0001000) and the
National Center for Marine Algae and Microbiota (NCMA). Custom Matlab scripts
allowed global biogeographical distribution of key genes of interest to be
mapped.";
    String awards_0_award_nid "653228";
    String awards_0_award_number "PLR-1341479";
    String awards_0_data_url "http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1341479";
    String awards_0_funder_name "NSF Division of Ocean Sciences";
    String awards_0_funding_acronym "NSF OCE";
    String awards_0_funding_source_nid "355";
    String awards_0_program_manager "Dr Chris H. Fritsen";
    String awards_0_program_manager_nid "50502";
    String cdm_data_type "Other";
    String comment 
"Transcriptome Statistics 
  Adrian Marchetti, PI 
  Version 11 October 2016";
    String Conventions "COARDS, CF-1.6, ACDD-1.3";
    String creator_email "info@bco-dmo.org";
    String creator_name "BCO-DMO";
    String creator_type "institution";
    String creator_url "https://www.bco-dmo.org/";
    String data_source "extract_data_as_tsv version 2.3  19 Dec 2019";
    String date_created "2016-11-18T23:55:55Z";
    String date_modified "2019-04-18T13:45:06Z";
    String defaultDataQuery "&amp;time&lt;now";
    String doi "10.1575/1912/bco-dmo.665311.1";
    String history 
"2026-07-23T21:04:15Z (local files)
2026-07-23T21:04:15Z https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_665311.html";
    String infoUrl "https://www.bco-dmo.org/dataset/665311";
    String institution "BCO-DMO";
    String instruments_0_acronym "Inverted Microscope";
    String instruments_0_dataset_instrument_description "Used to perform isolations";
    String instruments_0_dataset_instrument_nid "665318";
    String instruments_0_description 
"An inverted microscope is a microscope with its light source and condenser on the top, above the stage pointing down, while the objectives and turret are below the stage pointing up. It was invented in 1850 by J. Lawrence Smith, a faculty member of Tulane University (then named the Medical College of Louisiana).

Inverted microscopes are useful for observing living cells or organisms at the bottom of a large container (e.g. a tissue culture flask) under more natural conditions than on a glass slide, as is the case with a conventional microscope. Inverted microscopes are also used in micromanipulation applications where space above the specimen is required for manipulator mechanisms and the microtools they hold, and in metallurgical applications where polished samples can be placed on top of the stage and viewed from underneath using reflecting objectives.

The stage on an inverted microscope is usually fixed, and focus is adjusted by moving the objective lens along a vertical axis to bring it closer to or further from the specimen. The focus mechanism typically has a dual concentric knob for coarse and fine adjustment. Depending on the size of the microscope, four to six objective lenses of different magnifications may be fitted to a rotating turret known as a nosepiece. These microscopes may also be fitted with accessories for fitting still and video cameras, fluorescence illumination, confocal scanning and many other applications.";
    String instruments_0_instrument_external_identifier "https://vocab.nerc.ac.uk/collection/L05/current/LAB05/";
    String instruments_0_instrument_name "Inverted Microscope";
    String instruments_0_instrument_nid "675";
    String instruments_0_supplied_name "Olympus CKX41";
    String instruments_1_acronym "Bioanalyzer";
    String instruments_1_dataset_instrument_description "Used to determine RNA integrity";
    String instruments_1_dataset_instrument_nid "665321";
    String instruments_1_description "A Bioanalyzer is a laboratory instrument that provides the sizing and quantification of DNA, RNA, and proteins. One example is the Agilent Bioanalyzer 2100.";
    String instruments_1_instrument_name "Bioanalyzer";
    String instruments_1_instrument_nid "626182";
    String instruments_1_supplied_name "Agilent Bioanalyzer 2100";
    String keywords "bco, bco-dmo, biological, busco, BUSCO_pcnt, chemical, contig, contigs, contigs_num, contiguity, data, dataset, dmo, erddap, isogroups, isogroups_num, kegg, length, management, max, max_contig_length, mean, mean_contig_length, min, min_contig_length, n50, num, oceanography, office, pcnt, preliminary, raw, raw_sequence_reads, reads, ribosome, ribosome_pcnt, sequence, size, species, spliceosome, spliceosome_pcnt, transcriptome, transcriptome_size";
    String license "https://www.bco-dmo.org/dataset/665311/license";
    String metadata_source "https://www.bco-dmo.org/api/dataset/665311";
    String param_mapping "{'665311': {}}";
    String parameter_source "https://www.bco-dmo.org/mapserver/dataset/665311/parameters";
    String people_0_affiliation "University of North Carolina at Chapel Hill";
    String people_0_affiliation_acronym "UNC-Chapel Hill";
    String people_0_person_name "Adrian Marchetti";
    String people_0_person_nid "527120";
    String people_0_role "Principal Investigator";
    String people_0_role_type "originator";
    String people_1_affiliation "University of North Carolina at Chapel Hill";
    String people_1_affiliation_acronym "UNC-Chapel Hill";
    String people_1_person_name "Adrian Marchetti";
    String people_1_person_nid "527120";
    String people_1_role "Contact";
    String people_1_role_type "related";
    String people_2_affiliation "Woods Hole Oceanographic Institution";
    String people_2_affiliation_acronym "WHOI BCO-DMO";
    String people_2_person_name "Hannah Ake";
    String people_2_person_nid "650173";
    String people_2_role "BCO-DMO Data Manager";
    String people_2_role_type "related";
    String project "Polar_Transcriptomes";
    String projects_0_acronym "Polar_Transcriptomes";
    String projects_0_description 
"The Southern Ocean surrounding Antarctica is changing rapidly in response to Earth's warming climate. These changes will undoubtedly influence communities of primary producers (the organisms at the base of the food chain, particularly plant-like organisms using sunlight for energy) by altering conditions that influence their growth and composition. Because primary producers such as phytoplankton play an important role in global biogeochemical cycling, it is essential to understand how they will respond to changes in their environment. The growth of phytoplankton in certain regions of the Southern Ocean is constrained by steep gradients in chemical and physical properties that vary in both space and time. Light and iron have been identified as key variables influencing phytoplankton abundance and distribution within Antarctic waters. Microscopic algae known as diatoms are dominant members of the phytoplankton and sea ice communities, accounting for significant proportions of primary production. The overall objective of this project is to identify the molecular bases for the physiological responses of polar diatoms to varying light and iron conditions. The project should provide a means of evaluating the extent these factors regulate diatom growth and influence net community productivity in Antarctic waters. The project will also further the NSF goals of making scientific discoveries available to the general public and of training new generations of scientists. It will facilitate the teaching and learning of polar-related topics by translating the research objectives into readily accessible educational materials for middle-school students. This project will also provide funding to enable a graduate student and several undergraduate students to be trained in the techniques and perspectives of modern biology.
Although numerous studies have investigated how polar diatoms are affected by varying light and iron, the cellular mechanisms leading to their distinct physiological responses remain unknown. Using comparative transcriptomics, the expression patterns of key genes and metabolic pathways in several ecologically important polar diatoms recently isolated from Antarctic waters and grown under varying iron and irradiance conditions will be examined. In addition, molecular indicators for iron and light limitation will be developed within these polar diatoms through the identification of iron- and light-responsive genes -- the expression patterns of which can be used to determine their physiological status. Upon verification in laboratory cultures, these indicators will be utilized by way of metatranscriptomic sequencing to examine iron and light limitation in natural diatom assemblages collected along environmental gradients in Western Antarctic Peninsula waters. In order to fully understand the role phytoplankton play in Southern Ocean biogeochemical cycles, dependable methods that provide a means of elucidating the physiological status of phytoplankton at any given time and location are essential.";
    String projects_0_end_date "2017-07";
    String projects_0_geolocation "Antarctica";
    String projects_0_name "Iron and Light Limitation in Ecologically Important Polar Diatoms: Comparative Transcriptomics and Development of Molecular Indicators";
    String projects_0_project_nid "653229";
    String projects_0_project_website "http://www.nsf.gov/awardsearch/showAward?AWD_ID=1341479";
    String projects_0_start_date "2014-08";
    String publisher_name "Biological and Chemical Oceanographic Data Management Office (BCO-DMO)";
    String publisher_type "institution";
    String sourceUrl "(local files)";
    String standard_name_vocabulary "CF Standard Name Table v55";
    String summary "Transcriptome statistics from samples obtained on LMG1411 collected on the Gould (LMG1411) in the Western Antarctica Peninsula in 2014. (Polar Transcriptomes project)";
    String title "[Transcriptome statistics] - Transcriptome statistics from samples obtained on LMG1411 collected on the Gould (LMG1411) in the Western Antarctica Peninsula in 2014. (Polar Transcriptomes project) (Iron and Light Limitation in Ecologically Important Polar Diatoms: Comparative Transcriptomics and Development of Molecular Indicators)";
    String version "1";
    String xml_source "osprey2erddap.update_xml() v1.3";
  }
}

Using tabledap to Request Data and Graphs from Tabular Datasets

tabledap lets you request a data subset, a graph, or a map from a tabular dataset (for example, buoy data), via a specially formed URL. tabledap uses the OPeNDAP (external link)

Data Access Protocol (DAP) (external link)

and its selection constraints (external link)

The URL specifies what you want: the dataset, a description of the graph or the subset of the data, and the file type for the response.

(easy) You can get data by using the dataset's Data Access Form or Subset form. They make the URL for you.
(easy) You can make a graph or map by using the dataset's Make A Graph form. It makes the URL for you.
(not hard) You can bypass the forms and get the data or make a graph or map by generating the URL by hand or with a computer program or script.

Tabledap request URLs must be in the form
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/datasetID.fileType{?query}
For example,
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/pmelTaoDySst.htmlTable?longitude,latitude,time,station,wmo_platform_code,T_25&time>=2015-05-23T12:00:00Z&time<=2015-05-31T12:00:00Z
Thus, the query is often a comma-separated list of desired variable names, followed by a collection of constraints (e.g., variable<value), each preceded by '&' (which is interpreted as "AND").

For details, see the tabledap Documentation.