Base vs Byte: Estimating the storage requirement of sequencing

Sequencing Storage Requirement for Raw Data

Sequencing data output is calculated in Giga Bases (Gb), while computer store Information in Giga bytes (Gb), which can be confusing at time.

 

Thumb Rule: 4 bases = 1 byte of storage   (for data in fastq format)

 

Base Pair Nomenclature

Base Pairs (BP)

1 Kilo base pairs (KB)= 1,000 BP

1 Mega base pair (MB) = 1,000,000 BP or 1000 KB

1 Giga base pairs (Gb) = 1,000,000,000 BP or 1000 MB

1 Terra base pairs (TB) = 1000 GB

1 Peta base (PB) = 1000 TB

 

Computer systems follow a similar Nomenclature

Byte (B)

1 Kilo byte (KB)= 1,024 BP

1 Mega byte (MB) = 1024 KB

1 Giga byte (Gb) = 1024 MB

1 Terra byte (TB) = 1024 GB

1 Peta byte (PB) = 1024 TB

 

For Practical purpose and easy computation 1024 in computer nomenclature can be considered as 1000

Currently most of our Calculations in Genomics are taking place in Giga Bases, therefore we use that for our calculations and Estimating the storage requirements

Giga Bite (GB) = Giga Base / 4

Therefore, a flow cell generating 400 GB of data will require 400/4 = 100 Gb of disk space.

However, it should be noted that during analysis the data expands to 4 times. Therefore, for performing any analysis, there should be enough space on the machine.  

For Long term storage or sharing, the fastq files can be compressed using various compression algorithms and stored on Cloud.

 

Reference

Stephens et al. Big Data: Astronomical or Genomical? PLOS Biology July 7, 2015

https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195

 

At SEQOME, we Provide Data analysis Solutions from as Low as 1 Euro per Giga Base of Data, depending on the requirements *Conditions Apply.

www.seqome.com

 

error

Love Technology, Follow us