Bioinformatics-Resources是我創建的一個github項目,希望在我日常看文獻、逛GitHub以及維護BioInstaller包時收集一些與生物信息學相關的一些軟體和資料庫。

一個人的力量是薄弱的,希望大家可以一起收集起來,如果你想對本列表進行貢獻,可以STAR並fork我的倉庫然後發起Pull Request,列表的最後將新增你的名字。

如果有可能,我想把omictools上的工具爬下來 :(

Abstract: A curated list of resources for learning bioinformatics. Some of this repo resources were collected by BioInstaller project. You can use BioInstaller to directly download the source code or database files, or fetch the meta information by BioInstaller::get.meta()$item.

Purpose:

  • Provide some of bioinformatics learning resources for beginners
  • Provide a profiling of bioinformatics

Field:

  • Next generation sequencing (NGS)
  • Bioinformatics Data Analysis

Resources

General

  • Wikipedia
  • Org

Journal

  • Nature Method
  • Nature Genetics
  • Bioinformatics
  • BMC Bioinformatics
  • Nucleic Acids Research
  • Genome Research

Sequencing Technology

This section mainly copied from enseqlopedia.

Thanks this work: Hadfield, J. & Retief, J. A profusion of confusion in NGS methods naming. Nat Methods 15, 7-8 (2018).

RNA Sequencing Methods

Low-Level RNA Detection

  • CEL-Seq
  • CirSeq
  • CLaP
  • CytoSeq
  • Digital RNA Sequencing
  • DP-Seq
  • Drop-Seq
  • Hi-SCL
  • InDrop
  • MARS-Seq
  • Nuc-Seq
  • PAIR
  • Quartz-Seq
  • scM&T-Seq
  • SCRB-Seq
  • scRNA-Seq
  • scTrio-seq
  • Smart-Seq
  • Smart-Seq2
  • snRNA-Seq
  • STRT-Seq
  • SUPeR-Seq
  • TCR-LA-MC PCR
  • TIVA
  • UMI
  • 5C
  • Div-Seq
  • FRISCR
  • TCR Chain Pairing
  • AbPair

RNA Modifications

  • ICE
  • MeRIP-Seq
  • miCLIP-m6A
  • Pseudo-Seq
  • PSI-Seq

RNA Structure

  • CAP-seq
  • Cap-Seq
  • CIP-TAP
  • PARS-Seq
  • SPARE
  • Structure-Seq/DMS-Seq
  • CIRS-Seq
  • icSHAPE
  • SHAPE-MaP
  • SHAPE-Seq

RNA Transcription

  • 2P-Seq
  • 3NT Method
  • 3P-Seq
  • 3Seq
  • 3′-Seq
  • 5′-GRO-Seq
  • BruChase-Seq
  • BruDRB-Seq
  • Bru-Seq
  • CAGE
  • CHART
  • ChIRP
  • ClickSeq
  • GRO-seq
  • NET-Seq
  • PAL-Seq
  • PARE-Seq
  • PEAT
  • PRO-Cap
  • PRO-Seq
  • RAP
  • RARseq
  • RASL-Seq
  • RNA-Seq
  • SMORE-Seq
  • TAIL-Seq
  • TATL-Seq
  • TIF-Seq
  • TL-Seq
  • 4sUDRB-Seq
  • CaptureSeq
  • cP-RNA-Seq
  • FRT-Seq
  • GMUCT
  • mNET-Seq

RNA-Protein Interactions

  • AGO-CLIP
  • CLASH
  • CLIP-Seq or HITS-CLIP
  • DLAF
  • eCLIP
  • hiCLIP
  • iCLIP
  • miR-CLIP
  • miTRAP
  • PAR-CLIP
  • PIP-Seq
  • Pol II CLIP
  • RBNS
  • Ribo-Seq or ARTSeq
  • RIP-Seq
  • TRAP-Seq
  • TRIBE
  • BrdU-CLIP
  • HiTS-RAP
  • irCLIP

DNA Sequencing Methods

Protein-Protein Interaction

  • PD-Seq
  • ProP-PD/PDZ-Seq

Sequence Rearrangements

  • 2b-RAD
  • CPT-seq
  • ddRADseq
  • Digenome-seq
  • EC-seq
  • hyRAD
  • RAD-Seq
  • Rapture
  • RC-Seq
  • Repli-Seq
  • SLAF-seq
  • TC-Seq
  • Tn-Seq/INSeq
  • Bubble-Seq
  • NSCR
  • NS-Seq
  • Rep-Seq/Ig-Seq/MAF

DNA Break Mapping

  • BLESS
  • DSB-Seq
  • GUIDE-seq
  • HTGTS
  • LAM-HTGTS
  • Break-seq
  • SSB-Seq

DNA Protein Interactions

  • DNaseI Seq or DNase-Seq
  • Pu-seq
  • 3-C/Capture-C/Hi-C
  • 4C-seq
  • 5C
  • ATAC-Seq/Fast-ATAC
  • CATCH_IT
  • Chem-seq
  • ChIA-PET
  • ChIPmentation
  • ChIP-Seq/HT-ChIP/ChIP-exo/Mint-ChIP
  • DamID
  • DNase I SIM
  • FAIRE-seq/Sono-Seq
  • FiT-Seq
  • HiTS-FLIP
  • MINCE-seq
  • MNase-Seq/MAINE-Sequcleo-Sequc-seq
  • MPE-seq
  • NG Capture-C
  • NOMe-Seq
  • ORGANIC
  • PAT-ChIP
  • PB_seq
  • SELEX or SELEX-seq / HT-SELEX
  • THS-seq
  • UMI-4C
  • X-ChIP-seq

Epigenetics

  • Aba-seq
  • BisChIP-Seq/ChIP-BS-Seq/ChIP-BMS
  • BSAS
  • BSPP
  • BS-Seq/Bisulfite-Seq/WGBS
  • CAB-Seq
  • EpiRADseq
  • fCAB-seq
  • fC-CET
  • fC-Seal
  • hMeDIP-seq
  • JBP1-seq
  • MAB-seq
  • MBDCap-seq/MethylCap-Seq/MiGS
  • MeDIP-Seq/DIP-seq
  • MIRA
  • MRE-Seq and Methyl-Seq
  • xBS-Seq
  • PBAT
  • redBS-Seq/caMAB-seq
  • RRBS-Seq
  • RRMAB-seq
  • TAB-Seq
  • TAmC-Seq
  • T-WGBS

Low-Level DNA Detection

  • Safe-SeqS
  • scAba-seq
  • scATAC-Seq (Cell index variation)
  • scATAC-Seq (Microfluidics variation)
  • scBS-Seq
  • scM&T-Seq
  • scRC-Seq
  • SMDB
  • smMIP
  • G&T-Seq
  • 5C
  • DR-Seq
  • G&T-Seq
  • MALBAC
  • MDA
  • MIDAS/IMS-MDA/ddMDA
  • scM&T-Seq
  • Drop-ChIP/scChIP-seq
  • Duplex-Seq
  • MIPSTR
  • nuc-seq/SNES
  • OS-Seq

Tools

Package management

  • conda
  • Bioconductor
  • CRAN
  • CPAN
  • PyPi
  • npm
  • bower
  • gradle
  • ant
  • maven
  • Spack

Web Application Developement Framework

  • Galaxy
  • Bootstrap
  • Django
  • Yi

Web-based Service

  • UCSC
  • NCBI
  • CDD
  • ExPASy
  • EMBL-EBI
  • TCGA
  • COSMIC
  • St. Jude PeCan Data Portal
  • BIG Data Center
  • DAVID Bioinformatics Resources
  • cBioPortal
  • Oncoprinter
  • MutationMapper
  • Oncotator
  • QIAGEN Analysis Platform
  • Wordcloud
  • Omictools
  • iCoMut
  • UniProt
  • Pfam
  • SMART
  • STRING
  • DiseaseEnhancer
  • SEECancer
  • eQTL Browser
  • Cistrome Project
  • Cistrome Data Browser
  • Cistrome Cancer
  • Chromatin Regulator Cistrome
  • TIMER
  • VarCards
  • superdrug2
  • MeDReaders
  • ECOdrug
  • rSNPBase3.0
  • MNDR
  • MSDD
  • funcoup
  • proteinatlas
  • DGIdb
  • Drugbank
  • InterPro
  • ncbi-biosystems
  • denovo-db
  • The Human Phenotype Ontology (HPO)
  • FANTOM
  • dbNSFP
  • regSNP-intron
  • RADAR
  • DARNED
  • REDIportal
  • LNCediting
  • EggNOG
  • MiSTIC
  • DTMiner
  • PDBFlex
  • Cancer3d
  • Dsysmap
  • CBS Prediction Servers
  • wANNOVAR: Public web service of ANNOVAR
  • Harmonizome: Search for genes or proteins and their functional terms extracted and organized from over a hundred publicly available resources
  • GDA: A web-based tool that combines NCI60 uniquely large number of drug sensitivity data with CCLE and NCI60 gene mutation and expression profiles
  • CLUE: Unravel biology with the world』s largest perturbation-driven gene expression dataset
  • CMAP: The Connectivity Map (also known as cmap) is a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules and simple pattern-matching algorithms that together enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes.
  • pssmsearch: a web application to discover novel protein motifs (SLiMs, mORFs, miniMotifs) and PTM sites
  • bammmotif: Bayesian Markov Models (BaMMs), a web server for de-novo motif discovery and regulatory sequence analysis
  • LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis
  • GeNets: a unified web platform for network-based genomic analyses
  • HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization
  • paintomics: a web resource for the pathway analysis and visualization of multi-omics data
  • kinact: a computational approach for predicting activating missense mutations in protein kinases
  • VAReporter: VAReporter can provide comprehensive annotation by integrating a wide variety of biomedical databases
  • SNPnexus: SNPnexus was designed to simplify and assist in the selection of functionally relevant Single Nucleotide Polymorphisms (SNP) for large-scale genotyping studies of multifactorial disorders

Clinical Annotation

  • CIViC
  • DoCM
  • ClinVar
  • Intogen
  • Cancer Hotspots
  • DisGeNET
  • Cancer Biomarkers database
  • OncoKB: Precision Oncology Knowledge Base
  • LncRNADisease: Not only a resource that curated the experimentally supported lncRNA-disease association data but also a platform that integrated tool(s) for predicting novel lncRNA-disease associatons

Noncoding RNA Related Database

  • CSCD
  • AtCircDB
  • CircNet
  • circBase
  • circRNADb
  • exoRBase
  • EVLncRNAs
  • NONCODE: an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs)
  • MiTranscriptome: a catalog of human long poly-adenylated RNA transcripts derived from computational analysis of high-throughput RNA sequencing (RNA-Seq) data from over 6,500 samples spanning diverse cancer and tissue types
  • FANTOM CAT: an atlas of human long non-coding RNAs with accurate 5』 ends

eQTL Related Database

  • exsnp
  • rVarBase
  • seeQTL

Sequencing Data Portal

  • GDC
  • EGA
  • dbGaP
  • DDBJ
  • GEO
  • ICGC

Local tools

Quality Control

  • FastQC
  • PRINSEQ
  • SolexaQA
  • fastx_toolkit
  • picard
  • ngsqctoolkit
  • MultiQC
  • mosdepth
  • fastp
  • ChronQC

Alignment And Assembly

  • BWA
  • STAR
  • TMAP
  • NovoAlign
  • GMAP
  • bowtie
  • bowtie2
  • tophat2
  • hisat2
  • Edean
  • ABySS
  • SSAHA2
  • oases
  • Velvet
  • Trinity
  • MapSplice2
  • RUM
  • MECAT
  • DART
  • rHAT
  • taxmaps: large DNA/RNA metagenomics samples
  • MARVEL: consists of a set of tools that facilitate the overlapping, patching, correction and assembly of noisy (not so noisy ones as well) long reads.
  • vg: tools for working with genome variation graphs

Variant Detection (SNVs, INDELs, SVs)

  • GATK
  • MuTect
  • lofreq
  • VarScan2
  • freebayes
  • TVC
  • SomaticSniper
  • speedseq
  • FusionCatcher
  • svtoolkit
  • pindel
  • breakdancer
  • delly
  • CNVkit
  • speedseq
  • GRIDSS
  • PancanQTL
  • TumorFusions
  • SVScore
  • SVTools
  • RDDpred
  • iseq
  • deepvariant
  • SV2
  • facets
  • MutScan
  • svaba: structural variation and indel detection by local assembly
  • manta: structural variant and indel caller using mapped sequencing data
  • JAFFA: a multi-step pipeline that takes either raw RNA-Seq reads, or pre-assembled transcripts, then searches for gene fusions
  • Picky: structural variants pipeline for long reads
  • CREST: a algorithm for detecting genomic structural variations at base-pair resolution using next-generation sequencing data
  • Control-FREEC: a tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data
  • Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs
  • GISTIC2: facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers
  • BreaKmer: A method to identify structural variation from sequencing data in target regions
  • deTiN: DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.

Variant Annotation

  • ANNOVAR
  • SnpEff
  • gemini
  • VEP
  • Variant Annotation Integrator
  • vcfanno
  • pcgr
  • annovarR
  • OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes
  • bystro: Bystro genetic analysis (annotation, filtering, statistics

Variant Visualization (SNVs, INDELs, SVs)

  • ProteinPaint
  • AGFusion
  • GenomeUPlot
  • BreakPointSurveyor
  • chimeraviz
  • Oncoprinter
  • MutationMapper
  • pv: 3D structure visualization in WEB
  • g2s: mappings between protein sequence positions and PDB 3D protein structure models
  • NGB: structural Variations (SVs) visualization capabilities, high performance, scalability, and cloud data support

Variant Screen

  • LARVA
  • DANN

Alternative Splicing

  • LeafCutter
  • rMATS

Gene Expression Data Analysis

  • Cufflinks
  • DESeq2
  • edgeR
  • HTSeq
  • sRNAnalyzer

Virus Related

  • viral-ngs
  • qap
  • ROP: discovering the source of all RNA-seq reads, including those originating from repeat sequences, recombinant B and T cell receptors, and microbial communities
  • ViFi: Pipeline for identifying viral integration and fusion mRNA reads from NGS data

Single Cell

  • seurat
  • SCnorm
  • dropClust
  • scran: batch effect adjust
  • trendsceek: spatial expression trends in single-cell gene expression data
  • scRNA-tools: a database of software tools for the analysis of single-cell RNA-seq data.
  • awesome-single-cell: list of software packages (and the people developing these methods) for single-cell data analysis, including RNA-seq, ATAC-seq, etc.
  • SAVER: SAVER (Single-cell Analysis Via Expression Recovery) implements a regularized regression prediction and empirical Bayes method to recover the true gene expression profile in noisy and sparse single-cell RNA-seq data.

Protein Data Related

  • interproscan

Expression Quantitative Trait Loci, eQTL

  • CaVEMaN

ChIP-seq analysis

  • MACS
  • CEAS
  • MDSeqPos
  • conservation_plot

Primer Design

  • CEMAsuite
  • Primer3plus

Work flow

  • bcbio-nextgen
  • nextflow
  • orange3
  • sequana
  • snakemake
  • WDL
  • CWL

Unclassified

  • biopython
  • IRanges
  • org.Hs.eg.db
  • Biobase
  • GenomicAlignments
  • GenomicRanges
  • Rsamtools
  • jvarkit
  • htslib
  • samtools
  • bedtools
  • vcftools
  • bcftools
  • bamtools
  • maftools
  • bamUtil
  • vcflib
  • samstat
  • seqtk
  • sratools
  • bcl2fastq2
  • ucsc_utils
  • MeQA
  • IdCheck
  • SAMBLASTER
  • ngstk
  • BioInstaller
  • ChromHMM
  • ABSOLUTE
  • HAPSEG
  • Atlas-SNP, Atlas2 Suite
  • Beagle
  • CIBERSORT
  • biobloom
  • APAtrap
  • phenopredict: predicting phenotype sample information using gene expression
  • recount
  • bart: predicting functional transcription factors using gene set or a ChIP-seq dataset as input
  • LSMM (Latent Sparse Mixed Model): integrating functional annotations with genome-wide association studies
  • vcf2maf: Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
  • r2d3: R Interface to D3 Visualizations
  • liteq: Serverless R message queue using SQLite
  • ReLaXed: Create PDF documents using web technologies
  • dash: RStudio Addin to Run a Selection as a Background Job
  • threadpool: Parallel Processing in R using a Thread Pool

Statistical and Visualization

  • medcalc
  • GraphPad
  • ImageJ
  • SPSS
  • R
  • gvmap
  • easySVG
  • hexmapr
  • clustergrammer
  • chromVAR
  • echarts
  • plotly
  • qvalue: estimating q-values and false discovery rate quantities
  • GenVisR: genome data visualizations
  • r-color-palettes: Comprehensive list of color palettes available in r
  • sequenza: a novel set of tools providing a fast python script to genotype cancer samples, and an R package to estimate cancer cellularity, ploidy, genome wide copy number profile and infer for mutated alleles
  • opencpu: A system for embedded scientific computing and reproducible research with R
  • ggthemr: Themes for ggplot2
  • paletter: Build your ggplot2 palette from a picture
  • ggdag: An R package for working with causal directed acyclic graphs (DAGs), homepage
  • ggseqlogo: Publication-quality sequence logos in R.
  • threejs: JavaScript 3D library
  • higlass: Fast contact matrix visualization for the web, [homepage(higlass.io)

Text editor and IDE

  • Vim
  • Emacs
  • Atom
  • Sublime
  • Rstudio
  • Eclipse
  • PyCharm
  • Visual Studio

Remote Connection (SSH)

  • mobaXterm
  • Cygwin
  • Xshell & Xsftp
  • Putty
  • babun
  • cmder

Remote Connection (Desktop)

  • Teamviewer
  • Sunlogin
  • Splashtop
  • Chrome Remote Desktop app
  • Logmein
  • PC Anywhere
  • GoToMyPC
  • Radmin
  • UltraVNC

Other

  • igraph
  • root
  • boost
  • libtbb
  • docker

Books&Tutorial

R

  • R packages
  • stringr
  • Bioconductor Tutorial
  • limma
  • 30分鐘學會ggplot2
  • R Graphics Cookbook
  • Introduction to data.table
  • RSQLite
  • R Graphics
  • Wordcloud2

Linux&Shell

  • The Linux Command Line
  • Advanced Bash-Scripting Guide
  • Wicked Cool Shell Scripts
  • 鳥哥的 Linux 私房菜
  • 菜鳥教程

Python

  • Learning Python, 5th Edition
  • Python Examples
  • Learning Python
  • Python學習手冊

C/C++

  • C Primer Plus
  • C++ Primer Plus 6th Edition

JAVA

  • The Java? Tutorials

Statistics and Deep learning

  • SPSS Beginners Tutorials
  • Machine learning
  • Deep learning
  • Loss function
  • Maximum likelihood estimation
  • Bayes theorem
  • Perceptron
  • SVM
  • k-nearest neighbors algorithm
  • Convolutional Neural Network
  • K-Means
  • HMM
  • STAT115 - HMM PPT
  • 機器學習常用演算法
  • 機器學習資源列表
  • Review:Deep learning, genomics, and precision medicine
  • ML book list:

│ 李航.統計學習方法.pdf
│ 機器學習及其應用.pdf
│ All of Statistics - A Concise Course in Statistical Inference - Larry Wasserman - Springer.pdf
│ Machine Learning - Tom Mitchell.pdf
│ PRML.pdf
│ PRML讀書會合集列印版.pdf
│ Programming Collective Intelligence.pdf
│ [奧萊理] Machine Learning for Hackers.pdf
│ [機器學習]Tom.Mitchell.pdf
│ 《大數據:互聯網大規模數據挖掘與分散式處理》迷你書.pdf
│ 推薦系統實踐.pdf
│ 數據挖掘-實用機器學習技術(中文第二版).pdf
│ 數據挖掘_概念與技術.pdf
│ 機器學習-Mitchell-中文-清晰版.pdf
│ 機器學習導論.pdf
│ 模式分類第二版中文版Duda.pdf(全).pdf
│ 深入搜索引擎--海量信息的壓縮、索引和查詢.pdf
│ 矩陣分析.美國 Roger.A.Horn.掃描版.pdf
│ 統計學習基礎 數據挖掘、推理與預測.pdf

├─機器學習實戰
│ machinelearninginaction.zip
│ 機器學習實戰 單頁.pdf
│ 機器學習實戰.pdf

└─論文文集
└─LDA
LDA-wangyi.pdf
LDA數學八卦.pdf
text-est.pdf

Git

  • Git tutorials
  • Git 教程
  • Github Guides

Cloud

  • Cloud Computing
  • Docker入門教程

Bioinfomatics

  • 華大基因生物信息學培訓教材
  • 生物信息學入門
  • 《生物信息學入門最佳實踐》
  • The Biostar Handbook: A Beginners Guide to Bioinformatics
  • Bioinformatics Data Skills
  • 生信菜鳥團博客
  • 生信技能樹論壇

Skills

Programming language

  • Shell
  • Python
  • R
  • HTML/CSS
  • Javascript
  • PHP
  • SQL
  • C/C++
  • JAVA
  • Perl

Statistics

  • t-test
  • Chi-squared test
  • ANOVA
  • Normal distribution
  • Wilcoxon signed-rank test

Code Management

  • Git
  • Github

Institute or business company

  • Broad Institute
  • The European Bioinformatics Institute
  • illumina
  • Life Technologies
  • QIAGEN

People

  • Eric Lander
  • Leroy Hood
  • Mark Gerstein
  • Shirley Liu
  • Chuan He
  • Bing Ren
  • Job Dekker
  • Michael Snyder
  • Howard Chang
  • Mitch Guttman
  • John Rinn
  • Bradley E. Bernstein
  • Richard Michael Durbin
  • Pavel A. Pevzner
  • Brendan J. Frey
  • Jinghui Zhang
  • Ira M. Hall

Blog

  • Jianfeng Lis blog
  • RNA-seq Blog
  • Jianming Zengs blog
  • Yihui Xies blog
  • Fei Zhaos blog
  • Mengyuan Shens blog
  • Boqiang Hus blog
  • Bobs Blog
  • Homolog.us - Frontier in Bioinformatics
  • r-bloggers
  • DataTau
  • Bits of DNA, Lior Pachter
  • Next Generation Technologist
  • Simply Statistics
  • Massgenomics
  • OpenHelix
  • QIAGEN
  • Loman Labs Blog
  • Living in an Ivory Basement Stochastic thoughts on science, testing, and programming
  • Neil Saunders
  • Mike Love』s blog
  • Ewan Birney
  • In between lines of code
  • Heng Lis blog
  • MacArthur Lab
  • Blue Collar Bioinformatics
  • Simpson Lab
  • Bits of Bioinformatics

Contributors

  • Jianfeng Li
  • Bowen Cui
  • Shixiang Wang
  • l0o0

推薦閱讀:

查看原文 >>
相關文章