BioInstaller:一次Shiny應用開發的嘗試

來自專欄 Bioinformatics2 人贊了文章

簡介

一年多以前,我開始提交第一個BioInstaller的Commit,現在已經迭代到了v0.3.6。當時啟動這個項目純粹只是想有一個自己的工具來管理和下載一些生物信息資源,比如參考基因組、基因突變檢測和注釋的軟體以及相關的注釋資料庫等。後來,本著學習Shiny的目的,就給BioInstaller開發了Shiny圖形界面以及一些相關的基礎設施(隊列管理、任務提交、插件系統、變數管理等)。

希望在以下幾個方面對大家會有所幫助:

  • 生物信息資源的部署,整合了conda, spack以及自定義的部署腳本或者函數,而且提供了這些應用的Shiny介面
  • 生物信息資源的收集,基於最簡單的TOML格式文件,收集整理了數百種生物信息學工具、腳本還有資料庫。
  • 生物信息資源的分享,建立了一個公共的GitHub倉庫;通過TOML格式的文件,用戶可以在BioInstaller建立的Shiny平臺上分享他們的數據資源以及數據分析插件。
  • 數據分析Pipeline的構建,提供了豐富的TOML格式的數據文件,你在構建Pipeline時可以直接在線獲取或者拷貝這些文件,整合入你的Pipeline中,比如ANNOVAR的突變注釋資料庫的相關信息。
  • Shiny應用的構建:你可以直接使用BioInstaller的Shiny而不用再去開發一些基礎功能,比如文件上傳與管理、插件系統、任務隊列和提交等等。你只需要為你的核心功能構建一個TOML格式的數據分析插件即可(核心功能可以由一個R函數或者R命令封裝)。
  • 可重現的數據分析:我們提供一整套用於可重現的數據分析方式,輸出文件和日誌可追溯、提供整合Shiny,Opencpu和Rstudio服務的Docker容器。

這個R包仍然在不斷完善和迭代,希望可以給大家提供一個免費、開源的的Shiny分析環境。

未來的開發方向

因為我一個人的時間不一定可以持續並且快速的迭代開發,我列了一些主要需要被提升的功能,如果你感興趣,你也可以貢獻你的代碼:

  • 任務管理以及隊列系統的進一步完善和提高(參考Galaxy)
  • 更多的數據分析和可視化插件(比如WES/RNA-seq/Chip-seq/ATAC-seq等)
  • Shiny插件管理界面(現在是通過修改YAML格式的文件)
  • 基於TOML文件對生物信息資源的進一步收集、分類以及鏡像化
  • ......

如果你有任何的建議除了去項目主頁發起issue之外,可以直接在我的這篇博客進行評論,這些信息會存儲在博客論壇的GitHub issues。

英文簡介

The increase in bioinformatics resources such as tools/scripts and databases poses a great challenge for users seeking to construct interactive and reproducible biological data analysis applications.

R language, as the most popular programming language for statistics, biological data analysis, and big data, has enabled diverse and free R packages (>14000) for different types of applications. However, due to the lack of high-performance and open-source cloud platforms based on R (e.g., Galaxy for Python users), it is still difficult for R users, especially those without web development skills, to construct interactive and reproducible biological data analysis applications supporting the upload and management of files, long-time computation, task submission, tracking of output files, exception handling, logging, export of plots and tables, and extendible plugin systems.

The collection, management, and share of various bioinformatics tools/scripts and databases are also essential for almost all bioinformatics analysis projects.

Here, we established a new platform to construct interactive and reproducible biological data analysis applications based on R language. This platform contains diverse user interfaces, including the R functions and R Shiny application, REST APIs, and support for collecting, managing, sharing, and utilizing massive bioinformatics tools/scripts and databases.

Feature:

  • Easy-to-use
  • User-friendly Shiny application
  • An integrative platform of databases and bioinformatics resources
  • Open source and completely free
  • One-click to download and install bioinformatics resources (via R, Shiny or Opencpu REST APIs)
  • More attention for those software and database resources that have not been by other tools
  • Logging
  • System monitor
  • Task submission
  • Long-time computation
  • Parallel tasks

Field

  • Quality Control
  • Alignment And Assembly
  • Alternative Splicing
  • ChIP-seq analysis
  • Gene Expression Data Analysis
  • Variant Detection
  • Variant Annotation
  • Virus Related
  • Statistical and Visualization
  • Noncoding RNA Related Database
  • Cancer Genomics Database
  • Regulator Related Database
  • eQTL Related Database
  • Clinical Annotation
  • Drugs Database
  • Proteomic Database
  • Software Dependence Database
  • Bioinformatics-Resources
  • ......

Shiny UI overview

# install the latest developmental version# then start the BioInstaller R Shiny application# the document is still under constructionBioInstaller::web(auto_create = TRUE)

Installation

CRAN

#You can install this package directly from CRAN by running (from within R):install.packages(BioInstaller)

Github

# install.packages("devtools")devtools::install_github("JhuangLab/BioInstaller")

Contributed Resources

  • GitHub resource
  • GitHub resource meta information
  • Non GitHub resource
  • Non Github resource meta infrmation
  • Database
  • Web Service
  • Docker

Support Summary

Quality Control:

  • FastQC, PRINSEQ, SolexaQA, FASTX-Toolkit ...

Alignment and Assembly:

  • BWA, STAR, TMAP, Bowtie, Bowtie2, tophat2, hisat2, GMAP-GSNAP, ABySS, SSAHA2, Velvet, Edean, Trinity, oases, RUM, MapSplice2, NovoAlign ...

Variant Detection:

  • GATK, Mutect, VarScan2, FreeBayes, LoFreq, TVC, SomaticSniper, Pindel, Delly, BreakDancer, FusionCatcher, Genome STRiP, CNVnator, CNVkit, SpeedSeq ...

Variant Annotation:

  • ANNOVAR, SnpEff, VEP, oncotator ...

Utils:

  • htslib, samtools, bcftools, bedtools, bamtools, vcftools, sratools, picard, HTSeq, seqtk, UCSC Utils(blat, liftOver), bamUtil, jvarkit, bcl2fastq2, fastq_tools ...

Genome:

  • hisat2_reffa, ucsc_reffa, ensemble_reffa ...

Others:

  • sparsehash, SQLite, pigz, lzo, lzop, bzip2, zlib, armadillo, pxz, ROOT, curl, xz, pcre, R, gatk_bundle, ImageJ, igraph ...

Databases:

  • ANNOVAR, blast, CSCD, GATK_Bundle, biosystems, civic, denovo_db, dgidb, diseaseenhancer, drugbank, ecodrug, expression_atlas, funcoup, gtex, hpo, inbiomap, interpro, medreaders, mndr, msdd, omim, pancanqtl, proteinatlas, remap2, rsnp3, seecancer, srnanalyzer, superdrug2, tumorfusions, varcards ...

Docker

You can use the BioInstaller in Docker since v0.3.0. Shiny application was supported since v0.3.5.

docker pull bioinstaller/bioinstallerdocker run -it -p 80:80 -p 8004:8004 -v /tmp/download:/tmp/download bioinstaller/bioinstaller

Service list:

  • http://localhost/ocpu/ Opencpu service
  • http://localhost/shiny/BioInstaller Shiny service
  • http://localhost/rstudio/ Rstudio server (opencpu/opencpu)

推薦閱讀:

相關文章