bcodmo_dataset_665311

Name: [Transcriptome statistics] - Transcriptome statistics from samples obtained on LMG1411 collected on the Gould (LMG1411) in the Western Antarctica Peninsula in 2014. (Polar Transcriptomes project) (Iron and Light Limitation in Ecologically Important Polar Diatoms: Comparative Transcriptomics and Development of Molecular Indicators)
Creator: BCO-DMO
License: https://www.bco-dmo.org/dataset/665311/license

Grid DAP Data	Sub- set	Table DAP Data	Make A Graph	W M S	Source Data Files	Acces- sible	Title	Sum- mary	FGDC, ISO, Metadata	Back- ground Info	RSS	E mail	Institution	Dataset ID
		data	graph		files	public	[Transcriptome statistics] - Transcriptome statistics from samples obtained on LMG1411 collected on the Gould (LMG1411) in the Western Antarctica Peninsula in 2014. (Polar Transcriptomes project) (Iron and Light Limitation in Ecologically Important Polar Diatoms: Comparative Transcriptomics and Development of Molecular Indicators)		M	background			BCO-DMO	bcodmo_dataset_665311

The Dataset's Variables and Attributes

Row Type	Variable Name	Attribute Name	Data Type	Value
attribute	NC_GLOBAL	access_formats	String	.htmlTable,.csv,.json,.mat,.nc,.tsv
attribute	NC_GLOBAL	acquisition_description	String	Nine species of diatoms were isolated from the Western Antarctic Peninsula along the PalmerLTER sampling grid in 2013 and 2014. Isolations were performed using an Olympus CKX41 inverted microscope by single cell isolation with a micropipette (Anderson 2005). Diatom species were identified by morphological characterization and 18S rRNA gene (rDNA) sequencing. DNA was extracted with the DNeasy Plant Mini Kit according to the manufacturer\u2019s protocols (Qiagen). Amplification of the nuclear 18S rDNA region was achieved with standard PCR protocols using eukaryotic-specific, universal 18S forward and reverse primers. Primer sequences were obtained from Medlin et al. (1982). The length of the region amplified is approximately 1800 base pairs (bp ).\u00a0Pseudo-nitzschia\u00a0species are often difficult to identify by their 18S rDNA sequence, therefore, additional support of the taxonomic identification of\u00a0P.\u00a0subcurvata\u00a0was provided through sequencing of the 18S-ITS1-5.8S regions. Amplification of this region was performed with the 18SF-euk and 5.8SR_euk primers of Hubbard et al. (2008). PCR products were purified using either QIAquick PCR Purification Kit (Qiagen) or ExoSAP-IT (Affymetrix) and sequenced by Sanger DNA sequencing (Genewiz). Sequences were edited using Geneious Pro software ([http://www.geneious.com](\\"http://www.geneious.com\\"), Kearse et al., 2012) and BLASTn sequence homology searches were performed against the NCBI nucleotide non-redundant (nr) database to determine species with a cutoff identity of 98%. Diatom phylogenetic analysis was performed with Geneious Pro and included 71 additional diatom 18S rDNA sequences from publically available genomes and transcriptomes, including those in the MMETSP database. Diatom sequences were trimmed to the same length and aligned with MUSCLE (Edgar 2004). A phylogenetic tree was created in Mega with the Maximum-likelihood method of tree reconstruction, the Jukes-Cantor genetic distance model (Jukes and Cantor 1969), and 100 bootstrap replicates. Illumina TruSeq adapters and poly-A tails were trimmed from raw reads using the Fastx_toolkit clipper function. Fastq_quality_filter was used to remove poor quality sequences, such that remaining sequences had a minimum quality score of 20 with a minimum of 80% of bases within a\u00a0read\u00a0meeting this quality score requirement. Any remaining raw sequences less than 50 base pairs in length were also removed. Merged files were assembled\u00a0de novo\u00a0using Trinity (Grabherr et al. 2011). The resulting assembly was filtered to remove contigs less than 200 bp in length. Trinity-assembled contigs which exhibited sequence overlap were grouped into isogroups which were then used for sequence homology searches (BLASTx E-value \u2264 10-4) against the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases (Kanehisa 2006). BUSCO (Benchmarking Universal Single-Copy Orthologs) was used to assess the completeness of genomes and transcriptomes based on sets of\u00a0single copy\u00a0orthologous groups derived from OrthoDB that are highly conserved within multiple lineages (Felipe et al. 2015). Completed, duplicated and fragmented orthologs were determined by meeting an \u2018expected score\u2019 and having aligned sequences within two standard deviations of the BUSCO gene\u2019s length.\u00a0A second\u00a0metric of completeness was performed by evaluating conserved pathways, such as the ribosome and spliceosome, using the single-directional\u00a0best-hit\u00a0method in the KEGG Automatic Annotation Server (KAAS) (Moriya et al. 2007).\u00a0Finally\u00a0contiguity,\u00a0was calculated at the 0.75 level as according to Martin and Wang (2011) with custom scripts. For each transcriptome, unassembled sequence reads were aligned to the final Trinity assembly using Bowtie 2 (Langmead 2012). Mapped reads were normalized by the Reads per Kilobase per Million reads method (RPKM) (Mortazavi et al. 2008). Gene biogeographical distributions -\u00a020 genes of interest were selected in the study to investigate the molecular basis of iron and light limitation in polar diatoms. Reference sequences for each of these genes were obtained from the\u00a0F.\u00a0cylindrus\u00a0and\u00a0P.\u00a0tricornutum\u00a0JGI genome portals and\u00a0T.\u00a0pseudonana\u00a0and\u00a0T.\u00a0oceanica\u00a0NCBI and GenBank repositories. Reference sequences were identified in the transcriptomes by translated nucleotide homology searches (tBLASTn) with an e-value cutoff of <10-5. A reciprocal tBLASTn homology search was performed for each transcriptome against the KEGG GENES database, using the single- directional\u00a0best-hit\u00a0method in the KAAS online tool to ensure consistent gene annotations (Moriya et al. 2007). Subsequently, reference sequences were identified in the MMETSP protein database by BLASTp (e-value <10-5) homology searches among the diatom transcriptomes. The transcriptomes and their associated latitude and longitude were obtained from iMicrobe Data Commons (Project Code CAM_P_0001000) and the National Center for Marine Algae and Microbiota (NCMA). Custom Matlab scripts allowed global biogeographical distribution of key genes of interest to be mapped.
attribute	NC_GLOBAL	awards_0_award_nid	String	653228
attribute	NC_GLOBAL	awards_0_award_number	String	PLR-1341479
attribute	NC_GLOBAL	awards_0_data_url	String	http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1341479
attribute	NC_GLOBAL	awards_0_funder_name	String	NSF Division of Ocean Sciences
attribute	NC_GLOBAL	awards_0_funding_acronym	String	NSF OCE
attribute	NC_GLOBAL	awards_0_funding_source_nid	String	355
attribute	NC_GLOBAL	awards_0_program_manager	String	Dr Chris H. Fritsen
attribute	NC_GLOBAL	awards_0_program_manager_nid	String	50502
attribute	NC_GLOBAL	cdm_data_type	String	Other
attribute	NC_GLOBAL	comment	String	Transcriptome Statistics Adrian Marchetti, PI Version 11 October 2016
attribute	NC_GLOBAL	Conventions	String	COARDS, CF-1.6, ACDD-1.3
attribute	NC_GLOBAL	creator_email	String	info at bco-dmo.org
attribute	NC_GLOBAL	creator_name	String	BCO-DMO
attribute	NC_GLOBAL	creator_type	String	institution
attribute	NC_GLOBAL	creator_url	String	https://www.bco-dmo.org/
attribute	NC_GLOBAL	data_source	String	extract_data_as_tsv version 2.3 19 Dec 2019
attribute	NC_GLOBAL	date_created	String	2016-11-18T23:55:55Z
attribute	NC_GLOBAL	date_modified	String	2019-04-18T13:45:06Z
attribute	NC_GLOBAL	defaultDataQuery	String	&time<now
attribute	NC_GLOBAL	doi	String	10.1575/1912/bco-dmo.665311.1
attribute	NC_GLOBAL	infoUrl	String	https://www.bco-dmo.org/dataset/665311
attribute	NC_GLOBAL	institution	String	BCO-DMO
attribute	NC_GLOBAL	instruments_0_acronym	String	Inverted Microscope
attribute	NC_GLOBAL	instruments_0_dataset_instrument_description	String	Used to perform isolations
attribute	NC_GLOBAL	instruments_0_dataset_instrument_nid	String	665318
attribute	NC_GLOBAL	instruments_0_description	String	An inverted microscope is a microscope with its light source and condenser on the top, above the stage pointing down, while the objectives and turret are below the stage pointing up. It was invented in 1850 by J. Lawrence Smith, a faculty member of Tulane University (then named the Medical College of Louisiana). Inverted microscopes are useful for observing living cells or organisms at the bottom of a large container (e.g. a tissue culture flask) under more natural conditions than on a glass slide, as is the case with a conventional microscope. Inverted microscopes are also used in micromanipulation applications where space above the specimen is required for manipulator mechanisms and the microtools they hold, and in metallurgical applications where polished samples can be placed on top of the stage and viewed from underneath using reflecting objectives. The stage on an inverted microscope is usually fixed, and focus is adjusted by moving the objective lens along a vertical axis to bring it closer to or further from the specimen. The focus mechanism typically has a dual concentric knob for coarse and fine adjustment. Depending on the size of the microscope, four to six objective lenses of different magnifications may be fitted to a rotating turret known as a nosepiece. These microscopes may also be fitted with accessories for fitting still and video cameras, fluorescence illumination, confocal scanning and many other applications.
attribute	NC_GLOBAL	instruments_0_instrument_external_identifier	String	https://vocab.nerc.ac.uk/collection/L05/current/LAB05/
attribute	NC_GLOBAL	instruments_0_instrument_name	String	Inverted Microscope
attribute	NC_GLOBAL	instruments_0_instrument_nid	String	675
attribute	NC_GLOBAL	instruments_0_supplied_name	String	Olympus CKX41
attribute	NC_GLOBAL	instruments_1_acronym	String	Bioanalyzer
attribute	NC_GLOBAL	instruments_1_dataset_instrument_description	String	Used to determine RNA integrity
attribute	NC_GLOBAL	instruments_1_dataset_instrument_nid	String	665321
attribute	NC_GLOBAL	instruments_1_description	String	A Bioanalyzer is a laboratory instrument that provides the sizing and quantification of DNA, RNA, and proteins. One example is the Agilent Bioanalyzer 2100.
attribute	NC_GLOBAL	instruments_1_instrument_name	String	Bioanalyzer
attribute	NC_GLOBAL	instruments_1_instrument_nid	String	626182
attribute	NC_GLOBAL	instruments_1_supplied_name	String	Agilent Bioanalyzer 2100
attribute	NC_GLOBAL	keywords	String	bco, bco-dmo, biological, busco, BUSCO_pcnt, chemical, contig, contigs, contigs_num, contiguity, data, dataset, dmo, erddap, isogroups, isogroups_num, kegg, length, management, max, max_contig_length, mean, mean_contig_length, min, min_contig_length, n50, num, oceanography, office, pcnt, preliminary, raw, raw_sequence_reads, reads, ribosome, ribosome_pcnt, sequence, size, species, spliceosome, spliceosome_pcnt, transcriptome, transcriptome_size
attribute	NC_GLOBAL	license	String	https://www.bco-dmo.org/dataset/665311/license
attribute	NC_GLOBAL	metadata_source	String	https://www.bco-dmo.org/api/dataset/665311
attribute	NC_GLOBAL	param_mapping	String	{'665311': {}}
attribute	NC_GLOBAL	parameter_source	String	https://www.bco-dmo.org/mapserver/dataset/665311/parameters
attribute	NC_GLOBAL	people_0_affiliation	String	University of North Carolina at Chapel Hill
attribute	NC_GLOBAL	people_0_affiliation_acronym	String	UNC-Chapel Hill
attribute	NC_GLOBAL	people_0_person_name	String	Adrian Marchetti
attribute	NC_GLOBAL	people_0_person_nid	String	527120
attribute	NC_GLOBAL	people_0_role	String	Principal Investigator
attribute	NC_GLOBAL	people_0_role_type	String	originator
attribute	NC_GLOBAL	people_1_affiliation	String	University of North Carolina at Chapel Hill
attribute	NC_GLOBAL	people_1_affiliation_acronym	String	UNC-Chapel Hill
attribute	NC_GLOBAL	people_1_person_name	String	Adrian Marchetti
attribute	NC_GLOBAL	people_1_person_nid	String	527120
attribute	NC_GLOBAL	people_1_role	String	Contact
attribute	NC_GLOBAL	people_1_role_type	String	related
attribute	NC_GLOBAL	people_2_affiliation	String	Woods Hole Oceanographic Institution
attribute	NC_GLOBAL	people_2_affiliation_acronym	String	WHOI BCO-DMO
attribute	NC_GLOBAL	people_2_person_name	String	Hannah Ake
attribute	NC_GLOBAL	people_2_person_nid	String	650173
attribute	NC_GLOBAL	people_2_role	String	BCO-DMO Data Manager
attribute	NC_GLOBAL	people_2_role_type	String	related
attribute	NC_GLOBAL	project	String	Polar_Transcriptomes
attribute	NC_GLOBAL	projects_0_acronym	String	Polar_Transcriptomes
attribute	NC_GLOBAL	projects_0_description	String	The Southern Ocean surrounding Antarctica is changing rapidly in response to Earth's warming climate. These changes will undoubtedly influence communities of primary producers (the organisms at the base of the food chain, particularly plant-like organisms using sunlight for energy) by altering conditions that influence their growth and composition. Because primary producers such as phytoplankton play an important role in global biogeochemical cycling, it is essential to understand how they will respond to changes in their environment. The growth of phytoplankton in certain regions of the Southern Ocean is constrained by steep gradients in chemical and physical properties that vary in both space and time. Light and iron have been identified as key variables influencing phytoplankton abundance and distribution within Antarctic waters. Microscopic algae known as diatoms are dominant members of the phytoplankton and sea ice communities, accounting for significant proportions of primary production. The overall objective of this project is to identify the molecular bases for the physiological responses of polar diatoms to varying light and iron conditions. The project should provide a means of evaluating the extent these factors regulate diatom growth and influence net community productivity in Antarctic waters. The project will also further the NSF goals of making scientific discoveries available to the general public and of training new generations of scientists. It will facilitate the teaching and learning of polar-related topics by translating the research objectives into readily accessible educational materials for middle-school students. This project will also provide funding to enable a graduate student and several undergraduate students to be trained in the techniques and perspectives of modern biology. Although numerous studies have investigated how polar diatoms are affected by varying light and iron, the cellular mechanisms leading to their distinct physiological responses remain unknown. Using comparative transcriptomics, the expression patterns of key genes and metabolic pathways in several ecologically important polar diatoms recently isolated from Antarctic waters and grown under varying iron and irradiance conditions will be examined. In addition, molecular indicators for iron and light limitation will be developed within these polar diatoms through the identification of iron- and light-responsive genes -- the expression patterns of which can be used to determine their physiological status. Upon verification in laboratory cultures, these indicators will be utilized by way of metatranscriptomic sequencing to examine iron and light limitation in natural diatom assemblages collected along environmental gradients in Western Antarctic Peninsula waters. In order to fully understand the role phytoplankton play in Southern Ocean biogeochemical cycles, dependable methods that provide a means of elucidating the physiological status of phytoplankton at any given time and location are essential.
attribute	NC_GLOBAL	projects_0_end_date	String	2017-07
attribute	NC_GLOBAL	projects_0_geolocation	String	Antarctica
attribute	NC_GLOBAL	projects_0_name	String	Iron and Light Limitation in Ecologically Important Polar Diatoms: Comparative Transcriptomics and Development of Molecular Indicators
attribute	NC_GLOBAL	projects_0_project_nid	String	653229
attribute	NC_GLOBAL	projects_0_project_website	String	http://www.nsf.gov/awardsearch/showAward?AWD_ID=1341479
attribute	NC_GLOBAL	projects_0_start_date	String	2014-08
attribute	NC_GLOBAL	publisher_name	String	Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
attribute	NC_GLOBAL	publisher_type	String	institution
attribute	NC_GLOBAL	sourceUrl	String	(local files)
attribute	NC_GLOBAL	standard_name_vocabulary	String	CF Standard Name Table v55
attribute	NC_GLOBAL	summary	String	Transcriptome statistics from samples obtained on LMG1411 collected on the Gould (LMG1411) in the Western Antarctica Peninsula in 2014. (Polar Transcriptomes project)
attribute	NC_GLOBAL	title	String	[Transcriptome statistics] - Transcriptome statistics from samples obtained on LMG1411 collected on the Gould (LMG1411) in the Western Antarctica Peninsula in 2014. (Polar Transcriptomes project) (Iron and Light Limitation in Ecologically Important Polar Diatoms: Comparative Transcriptomics and Development of Molecular Indicators)
attribute	NC_GLOBAL	version	String	1
attribute	NC_GLOBAL	xml_source	String	osprey2erddap.update_xml() v1.3
variable	species		String
attribute	species	bcodmo_name	String	species
attribute	species	description	String	Species analyzed
attribute	species	long_name	String	Species
attribute	species	units	String	unitless
variable	raw_sequence_reads		int
attribute	raw_sequence_reads	_FillValue	int	2147483647
attribute	raw_sequence_reads	actual_range	int	681141, 2071629
attribute	raw_sequence_reads	bcodmo_name	String	unknown
attribute	raw_sequence_reads	description	String	Total number of raw sequence reads per species
attribute	raw_sequence_reads	long_name	String	Raw Sequence Reads
attribute	raw_sequence_reads	units	String	count
variable	contigs_num		int
attribute	contigs_num	_FillValue	int	2147483647
attribute	contigs_num	actual_range	int	6029, 44909
attribute	contigs_num	bcodmo_name	String	unknown
attribute	contigs_num	description	String	Number of contigs per species.
attribute	contigs_num	long_name	String	Contigs Num
attribute	contigs_num	units	String	count
variable	isogroups_num		int
attribute	isogroups_num	_FillValue	int	2147483647
attribute	isogroups_num	actual_range	int	4784, 42346
attribute	isogroups_num	bcodmo_name	String	unknown
attribute	isogroups_num	description	String	Number of isogroups per species.
attribute	isogroups_num	long_name	String	Isogroups Num
attribute	isogroups_num	units	String	count
variable	transcriptome_size		float
attribute	transcriptome_size	_FillValue	float	NaN
attribute	transcriptome_size	actual_range	float	2.2, 22.9
attribute	transcriptome_size	bcodmo_name	String	unknown
attribute	transcriptome_size	description	String	Transcriptome size by species.
attribute	transcriptome_size	long_name	String	Transcriptome Size
attribute	transcriptome_size	units	String	Megabase
variable	mean_contig_length		short
attribute	mean_contig_length	_FillValue	short	32767
attribute	mean_contig_length	actual_range	short	338, 687
attribute	mean_contig_length	bcodmo_name	String	length
attribute	mean_contig_length	description	String	Average contig length by species.
attribute	mean_contig_length	long_name	String	Mean Contig Length
attribute	mean_contig_length	units	String	base pair
variable	max_contig_length		short
attribute	max_contig_length	_FillValue	short	32767
attribute	max_contig_length	actual_range	short	5810, 8191
attribute	max_contig_length	bcodmo_name	String	length
attribute	max_contig_length	description	String	Maximum contig length by species.
attribute	max_contig_length	long_name	String	Max Contig Length
attribute	max_contig_length	units	String	base pair
variable	min_contig_length		short
attribute	min_contig_length	_FillValue	short	32767
attribute	min_contig_length	actual_range	short	200, 224
attribute	min_contig_length	bcodmo_name	String	length
attribute	min_contig_length	description	String	Minimum contig length by species.
attribute	min_contig_length	long_name	String	Min Contig Length
attribute	min_contig_length	units	String	base pair
variable	N50		short
attribute	N50	_FillValue	short	32767
attribute	N50	actual_range	short	315, 935
attribute	N50	bcodmo_name	String	length
attribute	N50	description	String	N50 value; N50 length is defined as the shortest sequence length at 50% of the genome
attribute	N50	long_name	String	N50
attribute	N50	units	String	unitless
variable	contiguity		float
attribute	contiguity	_FillValue	float	NaN
attribute	contiguity	actual_range	float	0.07, 0.25
attribute	contiguity	bcodmo_name	String	unknown
attribute	contiguity	description	String	Contiguity threshold 0.75
attribute	contiguity	long_name	String	Contiguity
attribute	contiguity	units	String	unitless
variable	BUSCO_pcnt		byte
attribute	BUSCO_pcnt	_FillValue	byte	127
attribute	BUSCO_pcnt	actual_range	byte	7, 56
attribute	BUSCO_pcnt	bcodmo_name	String	unknown
attribute	BUSCO_pcnt	description	String	Completeness of genome based on 429 core eukaryotic genes
attribute	BUSCO_pcnt	long_name	String	BUSCO Pcnt
attribute	BUSCO_pcnt	units	String	percent
variable	spliceosome_pcnt		byte
attribute	spliceosome_pcnt	_FillValue	byte	127
attribute	spliceosome_pcnt	actual_range	byte	21, 87
attribute	spliceosome_pcnt	bcodmo_name	String	unknown
attribute	spliceosome_pcnt	description	String	Spliceosome KAAS pathway completeness
attribute	spliceosome_pcnt	long_name	String	Spliceosome Pcnt
attribute	spliceosome_pcnt	units	String	percent
variable	ribosome_pcnt		byte
attribute	ribosome_pcnt	_FillValue	byte	127
attribute	ribosome_pcnt	actual_range	byte	60, 81
attribute	ribosome_pcnt	bcodmo_name	String	unknown
attribute	ribosome_pcnt	description	String	Ribosome KAAS pathway completeness
attribute	ribosome_pcnt	long_name	String	Ribosome Pcnt
attribute	ribosome_pcnt	units	String	percent
variable	KEGG		String
attribute	KEGG	bcodmo_name	String	unknown
attribute	KEGG	description	String	KEGG value; Functionally annotated contigs
attribute	KEGG	long_name	String	KEGG
attribute	KEGG	units	String	count

The information in the table above is also available in other file formats (.csv, .htmlTable, .itx, .json, .jsonlCSV1, .jsonlCSV, .jsonlKVP, .mat, .nc, .nccsv, .tsv, .xhtml) via a RESTful web service.