<2>. Public Resources Integrated for LLPS-associated Proteins:
(1) Intrinsically Disordered Region
(3) Post-translational Modification
(7) Disease-associated Information
(10) Protein Functional Annotation
(11) Protein Expression/Proteomics
==================================================================================
<1>. Sequence Analysis and Prediction of LLPS properties.
1. D2P2: a community resource for pre-computed disorder predictions on a large library of proteins from completely-sequenced genomes. Goals of the database include making statistical comparisons of the various prediction methods freely available to the prediction community, as well as facilitating biological investigation of the disordered protein space (Oates, et al., 2013). 2. IUPred2A: IUPred2A is a combined web interface that allows to identify disordered protein regions using IUPred2 and disordered binding regions using ANCHOR2 (Mészáros, et al., 2018). 3. MobiDB 3.0: Annotations of intrinsic protein disorder and its function (Piovesan, et al., 2018). 4. DisProt 7.0: DisProt is a community resource annotating protein sequences for intrinsically disorder regions from the literature (Piovesan, et al., 2017). 5. PLAAC: A web and command-line application to identify proteins with prion-like amino acid composition (Lancaster, et al., 2014). 6. LCR-eXXXplorer: A web platform to search, visualize and share data for low complexity regions in protein sequences (Kirmitzoglou, et al., 2015). |
<2> Public Resources Integrated in DrLLPS
(1) Intrinsically disordered region: 1. D2P2: A community resource for pre-computed disorder predictions on a large library of proteins from completely-sequenced genomes. Goals of the database include making statistical comparisons of the various prediction methods freely available to the prediction community, as well as facilitating biological investigation of the disordered protein space (Oates, et al., 2013). 2. IUPred2A: IUPred2A is a combined web interface that allows to identify disordered protein regions using IUPred2 and disordered binding regions using ANCHOR2 (Mészáros, et al., 2018). 3. MobiDB 3.0: Annotations of intrinsic protein disorder and its function (Piovesan, et al., 2018). 4. DisProt 7.0: DisProt is a community resource annotating protein sequences for intrinsically disorder regions from the literature (Piovesan, et al., 2017). 1. PLAAC: A web and command-line application to identify proteins with prion-like amino acid composition (Lancaster, et al., 2014). 2. AmyPro: A database of proteins with validated amyloidogenic regions (Varadi, et al., 2018). 3. SMART: A web resource for the identification and annotation of protein domains and the analysis of protein domain architectures (Letunic, et al., 2015). 4. Pfam: A widely used database of protein families, containing 14,831 manually curated entries in the current release (Finn, et al., 2016). 5. InterPro: A freely available database used to classify protein sequences into families and to predict the presence of important domains and sites (Finn, et al., 2017). 6. LCR-eXXXplorer: A web platform to search, visualize and share data for low complexity regions in protein sequences (Kirmitzoglou, et al., 2015). 7. CDD: NCBI's conserved domain database (Marchler-Bauer, et al., 2015). 8. PRINTS: A collection of diagnostic protein family 'fingerprints' (Attwood, et al., 2012). 9. PROSITE: PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them (Sigrist, et al., 2013). 10. Gene3D: A database of globular domain annotations for millions of available protein sequences (Lewis, et al., 2018). (3) Post-translational Modification: 1. EPSD: The Eukaryotic Phosphorylation Site Database (EPSD) is a comprehensive data resource updated from two databases of dbPPT and dbPAF, which contained 82,175 p-sites of 20 plants and 483,001 p-sites of 7 animals and fungi, respectively. 2. PLMD: Contained 284,780 modification events in 53,501 proteins (Xu, et al., 2017). 3. PhosphoSitePlus: Contains over 330,000 non-redundant PTMs, including phospho, acetyl, ubiquityl and methyl groups (Hornbeck, et al., 2015). 4. dbPTM 2019: Curates over 12,000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation (Huang, et al., 2016). 5. HPRD: Comprises of 95,016 phosphosites mapped on to 13,041 proteins (Goel, et al., 2012). 6. BioGRID: Contains 1,072,173 genetic and protein interactions, and 38,559 post-translational modifications (Chatr-Aryamontri, et al., 2017). 7. PhosphoBase: Comprises 414 phosphoprotein entries covering 1,052 phosphorylatable serine, threonine and tyrosine residues (Kreegipuu, et al. 1999). 8. mUbiSiDa: Deposited about 35,494 experimentally validated ubiquitinated proteins with 110,976 ubiquitination sites from five species (Chen, et al., 2014). 9. PTMcode: A resource for functional associations of post-translational modifications within and between proteins (Minguez, et al. 2015). 1. dbSNP: The NCBI database of genetic variation (Sherry, et al., 2001). 2. VarCards: Interpretation of coding variants in the human genome (Li, et al., 2018). 3. rSNPBase 3.0: An updated database of SNP-related regulatory elements, element-gene pairs and SNP-based gene regulatory networks (Guo, et al. 2018). 4. Kin-Driver: A database of driver mutations in protein kinases (Franco, et al., 2014). 1. TCGA: The Cancer Genome Atlas (TCGA) has generated comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer. The TCGA dataset, 2.5 petabytes of data describing tumor tissue and matched normal tissues from more than 11,000 patients, is publically available (Cancer Genome Atlas Research Network, 2017). 2. ICGC: The Data Portal currently contains data from 24 cancer projects, and consists of 3,478 genomes and 13 cancer types and subtypes (Zhang, et al., 2011). 3. COSMIC: Describes 2,002,811 coding point mutations in over one million tumor samples and across most human genes (Forbes, et al., 2014). 4. CGAP: The Cancer Genome Anatomy Project (CGAP) is an online database on normal, pre-cancerous and cancerous genomes (Schaefer, et al., 2001). 5. IntOGen: Integration and data mining of multidimensional oncogenomic data (Gundem, et al., 2010). 6. BioMuta: Mutation knowledgebases for cancer biomarker discovery (Dingerdissen, et al., 2018). 1. HINT: High-quality protein interactomes and their applications in understanding human disease (Das, et al., 2012). 2. InWeb_IM: >500,000 functional interpretation of >4,700 cancer genomes and genes involved in autism (Li, et al., 2017). 3. RISE: A database of RNA interactome from sequencing experiments (Gong, et al., 2018). 4. TRRUST v2 : An expanded reference database of human and mouse transcriptional regulatory interactions (Han, et al., 2018). 5. TIMBAL v2: A database holding molecules of molecular weight <1200 Daltons that modulate protein–protein interactions (Alicia, et al., 2013). 6. BindingDB: A public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of protein considered to be drug-targets with small, drug-like molecules. BindingDB contains 1,447,692 binding data, for 7,058 protein targets and 648,871 small molecules (Gilson, et al., 2016). 7. PLIC: Protein-ligand interaction cluster (Anand, et al., 2014). 8. RAIN: RNA–protein association and interaction networks (Junge, et al. 2017). 9. Mentha: A resource for browsing integrated protein-interaction networks (Calderone, et al., 2013). 10. YTRP: Aimed to find the TRP information for the TFPE-identified TF-gene regulatory pairs (Yang, et al., 2014). 11. RegNetwork: Gene regulatory networks for human and mouse by collecting the documented regulatory interactions among TFs, miRNAs and target genes (Liu, et al. 2015). 12. PINA: Including multiple collections of interaction modules identified by different clustering approaches from the whole network of protein interactions ('interactome') for six model organisms (Cowley, et al., 2012). 13. IID: A major replacement of the I2D interaction database, with larger PPI networks (a total of 1,566,043 PPIs among 68,831 proteins (Kotlyar, et al., 2016). 14. iRefIndex: A consolidated protein interaction database with provenance (Razick, et al., 2008). 15. MIST : Molecular Interaction Search Tool (MIST): an integrated resource for mining gene and protein interaction data (Hu, et al., 2018). 16. DifferentialNET : The DifferentialNet database of differential protein-protein interactions in human tissues (Basha, et al., 2018). 17. RNAct: Protein–RNA interaction predictions for model organisms with supporting experimental data (Lang, et al., 2019). (7) Disease-associated Information: 1. ClinVar: A public archive of reports of the relationships among human variations and phenotypes with supporting evidence (Landrum, et al., 2018). 2. PTMD: Contains 1,950 disease-associated PTM events in 749 proteins for 24 PTM types and 275 diseases (Xu, et al., 2018). 3. MSDD: miRNA SNP Disease Database (Yue, et al., 2018). 4. BRONCO: Biomedical entity Relation ONcology COrpus (BRONCO) contains more than 400 variants and their relations with genes, diseases, drugs, and cell lines in the context of cancer and anti-tumor drug screening research (Lee, et al., 2016). 5. HGVTB: HGV&TB, which hosts genetic variations reported to be associated with TB susceptibility in humans. It currently houses information on 307 variations in 98 genes. In total, 101 of these variations are exonic, whereas 78 fall in intronic regions (Sahajpal, et al., 2014). 6. PancanQTL: Systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types (Gong, et al., 2018). 7. TumorFusions: An integrative resource for cancer-associated transcript fusions (Hu, et al., 2018). 8. LncRNADisease: Contains 2,947 lncRNA-disease entries with controlled lncRNA and disease nomenclature (Chen, et al., 2013). 9. OMIM: A comprehensive, authoritative and timely research resource of curated descriptions of human genes and phenotypes and the relationships between them (Amberger, et al., 2015). 10. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants (Furlong, et al., 2017). 11. GWAS Central: A comprehensive resource for the comparison and interrogation of genome-wide association studies (Beck, et al., 2016). 12. GWASdb: Generated a total of 252,530 unique TASs, mapped 1,610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms (Li, et al., 2016). 13. DiseaseEnhancer: A resource of human disease-associated enhancer catalog (Zhang, et al., 2018). 14. ActiveDriverDB: Genome variation mapped against post-translational modifications (Krassowski, et al., 2018). 1. DrugBank: Contains 9,591 drug entries including 2,037 FDA-approved small molecule drugs, 241 FDA-approved biotech (protein/peptide) drugs, 96 nutraceuticals and over 6,000 experimental drugs (Law, et al., 2014). 2. TTD: Contains 2,025 targets, including 364 successful, 286 clinical trial, 44 discontinued and 1,331 research targets, 17,816 drugs, including 1,540 approved, 1,423 clinical trial, 14,853 experimental drugs and 3,681 multi-target agents (Zhu, et al., 2012). 3. KPID: A searchable database of specificities of 243 commonly used signal transduction inhibitors. 4. GRAC: Providing pharmacological, chemical, genetic, functional and pathophysiological data on the targets of approved and experimental drugs (Pawson, et al., 2014). 5. ADReCS-Target: Provides comprehensive information for illustrating ADRs caused by drug interactions with protein, gene and genetic variation (Zhang, et al., 2007). 6. ECOdrug : A database connecting drugs and conservation of their targets across species (Verbruggen, et al., 2018). 7. DGIdb 3.0: A redesign and expansion of the drug-gene interaction database (Kelsy, et al., 2018). 8. CTD: A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions (Davis, et al., 2013). 1. Compute pI/Mw: A tool which allows the computation of the theoretical pI (isoelectric point) and Mw (molecular weight) for a list of UniProt Knowledgebase (Swiss-Prot or TrEMBL) entries or for user entered sequences (Wilkins, et al., 1999). 2. AAindex: A database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids (Kawashima, et al., 2008). (10) Protein Functional Annotation: 1. THANATOS: An integrative data resource of proteins and post-translational modifications in the regulation of autophagy (Deng, et al. 2018). 2. iEKPD: Contained 197,348 phosphorylation regulators, including 109,912 protein kinases, 23,294 protein phosphatases and 68,748 PPBD-containing proteins in 164 eukaryotic specie (Guo, et al. 2019). 3. CGDB: A database of circadian genes in eukaryotes (Li, et al. 2017). 4. PIRSF: The PIRSF protein classification system reflects evolutionary relationships of full-length proteins and domains (Nikolskaya, et al., 2007). 5. ENPD: An online library of nucleic acid binding proteins (NBPs) and their functional information (Tak Leung RW, et al., 2019). 6. CORUM: The comprehensive resource of mammalian protein complexes (Giurgiu, et al. 2019). 7. CellMarker: A manually curated resource of cell markers in human and mouse (Zhang, et al. 2019). 8. RaftProt: Mammalian lipid raft proteome database (Shah, et al. 2015). (11) Protein Expression/Proteomics: 1. The Human Protein Atlas: 11,200 unique proteins corresponding to over 50% of all human protein-encoding genes have been analysed. (Pontén, et al., 2011). 2. Human Proteome Map: Including 30 histologically normal human samples, resulted in identification of proteins encoded by 17,294 genes (Kim, et al., 2014). 1. PDB: Contains 41,599 distinct protein sequences, 36,830 structures of human sequences and 9,465 nucleic acid containing structures (Berman, et al., 2000). 2. MMDB: Close to 60% of protein sequences tracked in comprehensive databases can be mapped to a known three-dimensional (3D) structure by standard sequence similarity searches (Madej, et al., 2012). 3. SCOP: A prototype of a new structural classification of proteins (Andreeva, et al., 2014). (13) Subcellular Localization: 1. COMPARTMENTS: Unification and visualization of protein subcellular localization evidence (Binder, et al., 2014). 2. NLSdb : Nuclear localization signals (Bernhofer, et al., 2018). 1. TCGA: The Cancer Genome Atlas (TCGA) has generated comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer. The TCGA dataset, 2.5 petabytes of data describing tumor tissue and matched normal tissues from more than 11,000 patients, is publically available (Cancer Genome Atlas Research Network, 2017). 2. ICGC: The Data Portal currently contains data from 24 cancer projects, and consists of 3,478 genomes and 13 cancer types and subtypes (Zhang, et al., 2011). 3. COSMIC: Describes 2,002,811 coding point mutations in over one million tumor samples and across most human genes (Forbes, et al., 2014). 4. GEO: NCBI gene expression and hybridization array data repository (Edgar, et al., 2002). 5. ArrayExpress: A public repository for microarray-based gene expression data, resulting from the implementation of the MAGE object model to ensure accurate data structuring and the MIAME standard, which defines the annotation requirements (Rocca-Serra, et al., 2003). 6. GXD: GXD includes >1.4 million expression results and 250,000 images (Smith, et al., 2013). 7. TissGDB: Tissue-specific Gene DataBase in cancer (Kim, et al., 2018). 8. TISSUES 2.0: An integrative web resource on mammalian tissue expression. (Palasca, et al. 2018). 9. BioXpress: Expression knowledgebases for cancer biomarker discovery (Dingerdissen, et al., 2018). 10. Human Proteome Map: Includes experimental verified microRNAs and experimental verified miRNA target genes in human, mouse, rat, and other metazoan genomes (Kim, et al., 2014). 11. The Human Protein Atlas: 11,200 unique proteins corresponding to over 50% of all human protein-encoding genes have been analysed (Pontén, et al., 2011). 1. UTRdb: A curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs (Grillo, et al., 2010). 2. circBase: A database for circular RNAs (Glažar, et al., 2014). 3. circRNADb: Containing 32,914 human exonic circRNAs carefully selected from diversified sources (Chen, et al., 2016). 4. CircNet: The expression of circRNAs in 464 RNA-seq samples (Liu, et al. 2016). 5. miRTarBase: Contains 4,966 articles, 7,439 strongly validated MTIs (using reporter assays or western blots) and 348,007 MTIs from CLIP-seq (Chou, et al., 2011). 6. miRWalk: Offers information on miRNAs, genes, epigenomics, pathways, ontologies, protein classes, phenotype, genotype, single-nucleotide polymorphisms, functional networks, tandem mass spectra and relevant PubMed articles (Dweep, et al., 2015). 7. TargetScan: Predicting effective microRNA target sites in mammalian mRNAs (Agarwal, et al., 2015). 8. miRecords: Includes 1,135 records of validated miRNA-target interactions between 301 miRNAs and 902 target genes in seven animal species (Xiao, et al., 2009). 9. SomamiR DB 2.0: 388,247 somatic mutations was mapped to the experimentally identified miRNA target sites (Bhattacharya, et al., 2016). 10. miRcode: Includes 10,419 lncRNA genes in the current version (Jeggari, et al., 2012). 11. RAID v2.0: Recruits more than 5.27 million RNA-associated interactions, referring to nearly 130,000 RNA/protein symbols across 60 species (Yi, et al., 2017). 12. OverGeneDB: Overlapping protein-coding genes (Rosikiewicz, et al., 2018). 13. SEA: A super-enhancer archive (Wei, et al., 2016). 14. microRNA.org: A comprehensive resource of microRNA target predictions and expression profiles (Betel, et al., 2008). 15. m6AVar: A database of functional variants involved in m6A modification (Zheng, et al., 2018). 1. TCGA: The Cancer Genome Atlas (TCGA) has generated comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer. The TCGA dataset, 2.5 petabytes of data describing tumor tissue and matched normal tissues from more than 11,000 patients, is publically available (Cancer Genome Atlas Research Network, 2017). 2. ICGC: The Data Portal currently contains data from 24 cancer projects, and consists of 3,478 genomes and 13 cancer types and subtypes (Zhang, et al., 2011). 3. COSMIC: Describes 2,002,811 coding point mutations in over one million tumor samples and across most human genes (Forbes, et al., 2014). 4. MethyCancer: Hosts both highly integrated data of DNA methylation, cancer-related gene, mutation and cancer information from public resources, and the CpG Island (CGI) clones derived from our large-scale sequencing (He, et al., 2008). 1. Ensembl: A database and genome browser for enabling research on vertebrate genomes (Aken, et al., 2017). 2. UniProt: The Universal Protein knowledgebase (UniProt Consortium, 2018). 3. GeneBank: A comprehensive database that contains publicly available nucleotide sequences for almost 260,000 formally described species (Benson, et al., 2013). 4. GO: The Gene Ontology (GO) project in 2006 (Gene Ontology Consortium, 2006). 5. KEGG: A knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules (Ogata, et al., 1999). 6. RESID: The RESID Database of Protein Modifications as a resource and annotation tool (Garavelli, 2004). |