Personnel and Responsibilities.
The proposed research involves the coordinated efforts of five laboratories at four universities (two public and two private), as well as participation by over 100 participating scientists from 23 countries. Four laboratories will focus on the collection and analysis of molecular data from the major groups of Fungi, and one laboratory will concentrate on the collection and analysis of non-nucleotide data across all groups of Fungi. The unique responsibilities of the participating laboratories are outlined below:
The proposed duration of this project is four years. The sampling of the molecular component of the project will include 1500 taxa and 10,000 base pairs from seven genes. The laboratories of Hibbett, Lutzoni, and Spatafora will each sample 400 taxa. Vilgalys, who will coordinate the sampling of the Chytridiomycota and Zygomycota (the basal lineages of Fungi) will sample 300 taxa. However, since many of the molecular markers have not been characterized as extensively for the basal lineages of Fungi, this group will likely require additional attention and resources, and thus sampling of 300 taxa will likely require the same amount of time, effort, and expenses as the other groups. We anticipate that sampling will proceed at the rate of approximately 75 -100 taxa per year per laboratory. Acquisition of non-molecular data must take into account both collection of new data and compilation of existing data. We estimate that the non-nucleotide laboratory will accumulate data at the rate of 10 taxa per year for new non-molecular data and ~100 taxa per year for existing non-molecular data. All data will be released immediately and all web-accessible databases will be up and functioning during the first year of the project. The PIs are currently in communication with prospective postdoctoral research associates and graduate students and we anticipate that we will be able to fill these positions by January 1, 2003. The visiting graduate student programs will be in place by April 1, 2003.
Annual milestones for judging productivity and progress.
Sampling of taxa and genes. The goal of the molecular component of the project is to sample 7 loci for 1500 taxa, at the rate of 75-100 taxa per year per laboratory. The annual data collection productivity of the project will be represented as the percentage of both taxa and genes sequenced per year per laboratory. We will also maintain information of progress made in sequencing of genes by taxon. This will allow us to easily recognize lineages for which a given locus may be problematic and require redesign of PCR primers or sequencing strategy. This compilation will be automated and will allow a continuous assessment of the achievements in terms of sequencing, material availability for targeted taxa, and assembled data sets.
Human resource development. Each laboratory will support a postdoctoral research associate and a resident graduate student. In addition, a sixth postdoctoral research associate, whose responsibility will be to assist in developing and maintaining the molecular database, will be supported. Visiting graduate students from participating laboratories will be supported for periods of 3 months to one year depending on the needs of sampling and data acquisition. We anticipate that each laboratory will support 1-4 visiting students per year for a total of 5-20 per year or 20-80 for the duration of the project. If the proposal is funded, each PI will also apply for REU awards for the purpose of involving two undergraduates per year per laboratory in the project. In summary, the number persons who will gain experience in molecular systematics and phyloinformatics of fungi as part of AFTOL includes 6 postdoctoral research associates, 4 resident graduate students, 20-80 visiting graduate students, and ~40 undergraduate students. Throughout the process of managing this project, the PIs will make every attempt to include the broadest spectrum of race, ethnicity and gender among the participants of AFTOL as demonstrated by more than 110 collaborators from 23 countries (see letter of supports).
Outreach activities. Every summer one workshop will be hosted at each institution for local high school teachers, who will receive training in fundamentals of mycology, emphasizing the diversity of fungi and their ecological and economic importance, as well as recent discoveries in fungal phylogeny. To insure that all participants receive personalized instruction, workshops will be limited to 6-8 teachers. Thus, the annual goal is to provide training to 24-32 teachers.
Curatorial, computational, sequencing, and informatic facilities and resources.
Curation. All specimens included in the AFTOL project will be vouchered in herbaria and/or culture collections and/or as frozen material at ¤80ÉC. Spatafora, McLaughlin, and Lutzoni all curate fungal and or lichen collections and will provide for any herbarium curatorial needs the project may require. Voucher information as well as material availability as frozen material, cultures or herbarium specimens will be an integral part of the centralized AFTOL database at Duke, which will also be linked to the herbarium databases and to the non-molecular database at University of Minnesota. Curators of national and international culture collections (Humber, ARSEF, Siefert - ECORC, O'Donnell - NRRL, Geiser - FRC, Stalpers - CBS, Cannon-CABI) are participants in the project and where appropriate and possible will provide long term storage of cultures of specimens included in the study.
Computational. All the PIs laboratories are well equipped with personal computers and software capable of state of the art phylogenetic analyses. In addition, Lutzoni and Vilgalys have three servers for computational needs of this project and more than 10 Mac computers for data visualization, data analyses and access to various servers. An additional server will be obtained as a cost-share toward this project for the specific use of the central AFTOL database and other potential ATOL projects at Duke. Finally, we will have full access to the computational facilities at the North Carolina Supercomputing Center (see Jack da SilvaÃs letter of support) including an IBM RS/6000 SP with 720 processors.
Sequencing. The four laboratories that are focused on molecular data will all sequence the same core set of seven genes, including the nuclear (nuc) SSU rDNA (~1.7 kb), nuc-LSU rDNA (5' region ~1.4 kb), RPB1 (regions A-G ~2.5 kb), RPB2 (regions 3-11 ~2.5 kb), EF-1a (~1.4 kb), and the mitochondrial locus ATP6 (~0.8 kb). These loci were selected based on the current knowledge of levels of variation across Fungi, existence of data and PCR primers for all genes for at least some subset of Fungi, and the widespread and successful use of ITS rDNA data in identifying species and measuring fungal diversity in environmental sampling. Sequences and PCR primers are available from all four phyla of Fungi for SSU rDNA, LSU rDNA, ITS, and EF-1a and from numerous lineages of the Ascomycota and Basidiomycota for RPB1, RPB2, and ATP6. Based on pre-existing data and the collective expertise in molecular systematics and marker development of the PIs and the numerous participating laboratories, we are confident that sequencing of all loci will be successful. The Vilgalys laboratory will serve as a central sequencing facility for the AFTOL project. All sequencing reactions will be performed in the laboratories of the PIs using ABI Big Dye Terminator sequencing kits in 96-well microtiter plates. A large proportion of the sequencing reactions will then be shipped to the Duke Sequencing Facility to be run on an ABI 3700 capillary automated DNA sequencer. As part of the Duke cost-share for this project and the establishment of the Center for Evolutionary Genomics, a second ABI 3700 will soon be available for this project as well as a series of additional equipment to alleviate the additional molecular load this project will have on Lutzoni and Vilgalys labs at Duke. See description of Duke facilities and letter of support by the Dean of Natural Sciences at Duke, Dr. Berndt Mueller.
Sampling of non-nucleotide data. The analysis of the non-nucleotide data has two components: 1) compilation of existing data, quality evaluation, and establishment of the searchable database, and 2) acquisition of new data to fill significant gaps in the database. Benefits of the non-molecular study are the compilation and evaluation of more than 40 years of data and their organization in a common format that will provide guidelines for further database development. These data will also provide a uniform source of characters to assist molecular phylogenetic studies and analysis of character evolution. Long term maintenance of the database will be either on the server at the Bell Museum of Natural History, University of Minnesota, or at Duke University.
For the existing data the following characters will be compiled: spindle pole body (SPB) form and cycles, mitotic and meiotic features, septum and septal pore organization, characteristics of vegetative hyphae (golgi apparatus, Spitzenkñrper, other organelles) and specialized cell types, motile cell structure, haustorium, meiospore and meiosporangium, and selected biochemical characters. Characters that will be the focus of new studies will be catalogued first, with priority for other characters to be determined in consultation with the Non-molecular Participants Group. Character quality will be assessed to determine adequacy of existing data. Character states will be defined and coded, images chosen, and data entered into data tables. Responsibilities for data review and entry will be divided between DJM and the postdoctoral fellow, who will work with the Academic and Distributed Services staff, University of Minnesota, to design and test the database. Coded characters will be released to the Non-molecular Participants Group for testing in phylogenetic analysis. Progress will be measured as follows: 1st yr, SPB and nuclear division characters; develop database; 2nd yr, septa, vegetative hyphae and specialized cell types; complete database; 3rd yr, meiospores, meiosporangia, biochemical characters; 4th yr, motile cells, haustoria.
For new data, the main focus will be on SPB, nuclear division, septa, and a specialized cell type, the cystidium. These studies will be the main focus of the postdoctoral fellow and the resident graduate student. Visiting graduate students may pursue these topics or other characters relevant to their research, and will be assisted with specimen processing, data acquisition and evaluation, and character coding. Culturing and cell selection methods will be handled by DJM and postdoctoral fellow; specimen preparation and transmission electron microscopy, by the Imaging Center staff; data evaluation and character coding, by DJM and postdoctoral fellow. High quality specimen preparation will involve a variety of freezing methods or microwave processing. Nuclear division studies are inherently slow, and require finding mitotic/meiotic stages. Multiple cells at each stage are needed. We will plan on two nuclear division studies per year, and 10 less demanding, subcellular studies (e.g. septa, cystidia, other).
Informatics. Six facilities will be available at Duke and in the Research Triangle to help the PIs to reach the goals of the proposed AFTOL project. LutzoniÃs lab at Duke will host the two postdocs (C. Cox and S. Zoller) who will develop the automated and interactive database for this project. They will be the bridge between the five PIs and the remaining five facilities that will provide assistance and services for this project. The Department of Biology has three full-time technicians and computer scientist to whom Cox and Zoller have daily access. This will be the main personnel resources for purchasing and installing the new server for the AFTOL central data base. Kimberly Johnson and Simon Lin from The Duke Bioinformatics Shared Resource and from the Cancer Center Information Systems are outstanding computer scientists, who will provide expertise in the automation of data acquisition and data management (see letter of support from Kimberly Johnson). The Duke Center for Bioinformatics and Computational Biology will provide consulting services for bioinformatics support and biological science data base management (see letter of support from John Harer, Vice Provost for Academic Affairs at Duke). Finally, the North Carolina Supercomputing Center (Research Triangle) will provide extensive computer power for phylogenetic analyses, storage facilities, and assistance for maximizing our use of the their supercomputers (see letter from Jack da Silva).
MolecularDatabases. Data will be stored in a common format to facilitate integration with the other institutes, and provide a standards compliant storage facility for final integration into the larger Tree of Life initiative. All data pertaining to the project will be held in custom-made Structured Query Language (SQL) database held at Duke University. The database server currently deployed by the Dept. of Biology, Duke University, is an open -source, SQL-compliant, Object-Relational Database Management System called PostgreSQL. The open-source nature of PosgreSQL provides a cost-effective (free), and proven solution, while maintaining SQL standards compliance for easy integration into similar database management systems (e.g. Oracle).
The database will provide a central storage facility for data relating to the taxa, specimen voucher information, laboratory codings, and current status of the sequences available for each study taxon. In addition, all sequences and generated for the project will be held in the database and referable to the trace electropherograms (trace files will be archived by individual institutions). Centralization of data storage will provide the crucial concept of data integrity within and among participating institutions.
Access to the database will be provided by secure (passworded) SSH web-based interfaces designed specifically for the project. These interfaces will enable individual institutions to have a consistent and current view of the entire project. Each institution will be able to update and manipulate data for which the institution is responsible, as well as being able to read all information pertaining to the project.
In addition, the interfaces will provide automated retrieval facilities to enable downloading of data in a variety of formats. Specifically, individual sequences will be available in FASTA, GDE Flat, and GENBANK formats, and automatically generated alignments (via ClustalX) of multiple taxon/gene compliments will be available in NEXUS format. Storage of a current 'stable' alignment of each data partition will also be available. The ability to generate specific taxon/gene matrices will enable rapid monitoring of the current phylogenetic hypotheses and provide a quality check on the available data. A prototype for the project management system is currently deployed at Duke and provides a similar data storage facility and data management interface to co-ordinate a large scale sequencing project under taken in collaboration with the University of Connecticut.
Non-molecular databases. The non-molecular data will be stored in a structured database in a relational format that can be queried for any combination of data attributes. The data will have all of the information necessary to do later analyses. The software mySQL will be used locally for database management. Periodic exports will be made and imported at a central Oracle server, provided by University of Minnesota Java and Web Services (JAWS), from which it will be web accessible. Our web interface to the database will be implemented using ODBC programmed with Perl CGI or JAVA Server Pages using SQL. We will implement the web design initially with programming assistance from the JAWS staff at the University, and will train a graduate student and a postdoctoral fellow at training workshops run by the University to update and modify the database. The primary relational table will contain the character values for each species. In addition, two auxiliary tables containing the metadata will be stored, one with data on the species and the second with the characters. Example of auxiliary tables and primary data are provided on the Deep Hypha web site.
Coordination with foreign-based projects.
The PIs of the coordinating laboratories have been proactive in soliciting participation in the AFTOL project from international researchers. Twenty-three countries from all continents are represented among the participants. International participants will be involved in acquisition of material and students from their laboratories will be eligible for graduate student training aspects of the AFTOL. Many of these laboratories have on-going projects in specific groups of fungi. Where that is true, the taxon overlap between their research and AFTOL will be minimal. In addition, most independent research projects are limited to one or two genes. The AFTOL project will sequence seven genes and thus represent an invaluable resource for continued and expanded multi-gene datasets at lower taxonomic levels.