You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 24 Next »

Note (Help Messages): a -h flag after a command (e.g. >structural_pangenome find-ngaps -h) generates a help message for said command.

Step1: Ngap detection

Description 

Report the stretch of Ns (Ngap) in each assembly 

Command 

structural_pangenome find-ngaps assembly_fasta_file -o output_file

OR

structural_pangenome find-ngaps assembly_fasta_file -p prefix_folder

haptic command

haptic find-ngaps assembly_fasta_file_file -o output_file

OR

haptic find-ngaps assembly_fasta_file_file -p prefix_folder

Parameters 

o: output filename (with complete path)

p: prefix_folder which stores the resulting output file.

t: optional parameter, threshold: minimum recurrence of Ngap sequence length that affects statistics (integer)

Input 

Fasta file for each assembly  (FILE)

output 

Ngap.bed file for each assembly (OUTPUT)

Usagestructural_pangenome find-ngaps [-h] (-p PREFIX | -o OUTPUT) [-t THRESHOLD] FILE

Example 

structural_pangenome find-ngaps fastas/assembly_1.fas -o outputs/1.Ngaps/resulting_ngaps.bed

haptic examplehaptic find-ngaps fastas/assembly_1.fas -o outputs/resulting_ngaps.bed

Can be Parallelized 

Yes 

Optional  

No 


Step2.1: Indexing

Description 

Indexing the assembly fasta file

Command 

structural_pangenome  index assembly_fasta_file -o output_file

OR

structural_pangenome index assembly_fasta_file -p prefix_folder

haptic command

haptic index assembly_fasta_file -o output_file

OR

haptic index assembly_fasta_file -p prefix_folder

Parameters 

o: output filename (with complete path)

p: prefix_folder which stores the resulting output file.

Input 

Fasta file for each assembly (FILE)

output 

Indexed fasta file for each assembly (OUTPUT)

Usagestructural_pangenome index [-h] (-p PREFIX | -o OUTPUT) FILE

Example 

structural_pangenome index fastas/assembly_1.fas -p outputs/2.1.indexed_fasta

haptic examplehaptic index fastas/assembly_1.fas -p outputs/2.1.indexed_fasta/

Can be Parallelized 

Yes 

Optional  

No 


Step2.2: Iterative Pangenome graph build

Description 

Building the pangenome graph using minigraph iteratively. You need to define the order in which the assemblies will be processed. First, you need to define which assembly is better to start with. Then other genomes need to be ranked according to the completeness of the assembly and the distance relative to the first chosen assembly (from closest to farthest). 

Command 

structural_pangenome build-graph assembly1_fasta_file assembly2_fasta_file -o assembly1__assembly2_graph_filename

OR

structural_pangenome build-graph assembly1_fasta_file assembly2_fasta_file -p folder_to_populate_output

haptic command

haptic build-graph assembly1_fasta_file assembly2_fasta_file -o assembly1__assembly2_graph_filename

OR

haptic build-graph assembly1_fasta_file assembly2_fasta_file -p folder_to_populate_output

Parameters 

o: output filename (with complete path)

p: prefix_folder which stores the resulting output file (with complete path)

Input 

Fasta file for each assembly (FASTA), (and possibly gfa file to add - GRAPH)

output 

One gfa file for all assemblies (OUTPUT)

Usage

structural_pangenome build-graph [-h] (-p PREFIX | -o OUTPUT) GRAPH FASTA

Example 

structural_pangenome build-graph fastas/assembly_1.fas fastas/assembly_2.fas -p outputs/2.2gfas/assembly_pangenome_graph

haptic examplehaptic build-graph fastas/assembly_1.fas fastas/assembly_2.fas -p outputs/assembly_pangenome_graph

Parallelizable 

No 

Optional  

No 


Step2.3: Graph & stat

DescriptionExtract from minigraph pangenome all fragmented sequences in a FASTA file
Commandstructural_pangenome convert gfa2fa assembly_1_2_3.gfa > pangenome__fasta.fas
Parametersgfa2fa: convert GFA file to FASTA file
InputOne gfa file for all assemblies (GFA)
OutputOne FASTA file for the whole pangenome (OUTPUT)
Usagestructural_pangenome convert gfa2fa GFA > OUTPUT
Examplestructural_pangenome convert gfa2fa 2.2.gfas/assembly123.gfa > 2.3.Stat/pangenome__fasta.fas
ParallelizableNo
OptionalNo


Step2.4: Pangenome Sequence Length distribution

Description 

Get the pangenome sequence length distribution 

Command 

structural_pangenome find-seqlen pengenome_fasta_file -o output_file

OR

structural_pangenome find-seqlen pengenome_fasta_file -p prefix_folder

haptic command

haptic find-seqlen pengenome_fasta_file -o output_file

OR

haptic find-seqlen pengenome_fasta_file -p prefix_folder

Parameters 

o: output filename (with complete path) (OUTPUT)

p: prefix_folder which stores the resulting output file (with complete path) (PREFIX)

g: optional parameter which also displays a graph of the sequence length distribution (-g)

Input 

Fasta file for each assembly (FILE)

output 

One csv file for the whole Pangenome (OUTPUT)

Usagestructural_pangenome find-seqlen [-h] (-p PREFIX | -o OUTPUT) [-g] FILE

Example 

structural_pangenome find-seqlen pangenomes/pangenome__fasta.fas -o outputs/2.4.PangSeqLen/pangenome_seqlen.csv -g

haptic examplehaptic find-seqlen pangenomes/pangenome__fasta.fas -o outputs/pangenome_seqlen.csv

Can be Parallelized 

No 

Optional  

Yes 


Step2.5: Assembly vs pangenome

Description 

Compare Assembly vs pangenome 

Command 

structural_pangenome indexed_assembly_fasta_file pangenome_fasta_file -o output_file

OR

structural_pangenome align indexed_assembly_fasta_file pangenome_fasta_file -p prefix_folder

haptic command

haptic align indexed_assembly_fasta_file pangenome_fasta_file -o output_file

OR

haptic align indexed_assembly_fasta_file pangenome_fasta_file -p prefix_folder

Parameters 

o: output filename (with complete path)

p: prefix_folder which stores the resulting output file (with complete path)

Input 

Indexed fasta file for each assembly (TARGET) - From Step2.1, and the Pangenome Fasta file (QUERY) - From Step2.3

output 

Pangenome paf file  (OUTPUT)

Usagestructural_pangenome align [-h] (-p PREFIX | -o OUTPUT) TARGET QUERY

Example 

structural_pangenome align fastas/assembly_1_indexed.fas pangenome/pangenome__fasta.fas -o outputs/output_pangenome.paf

haptic examplehaptic align fastas/assembly_1_indexed.fas pangenome/pangenome__fasta.fas -o outputs/output_pangenome.paf

Can be Parallelized 

Yes 

Optional 

No 


Step3: Convert the PAF file into a delta file

Description 

Convert the PAF file into a delta file. 

Command 

structural_pangenome convert paf2delta paf_file assembly__fasta pangenome__fasta -o output_file

OR

structural_pangenome convert paf2delta paf_file assembly__fasta pangenome__fasta -p prefix_folder

haptic command

haptic convert paf2delta paf_file assembly__fasta pangenome__fasta -o output_file

OR

haptic convert paf2delta paf_file assembly__fasta pangenome__fasta -p prefix_folder

Parameters 

o: output filename (with complete path)

p: prefix_folder which stores the resulting output file (with complete path)

Input 

  • Pangenome Fasta File to convert (QUERY) - From Step2.3
  • Assembly Fasta File (TARGET)
  • Assembly Paf File (FILE) - From Step2.5

output 

Assembly Delta File (OUTPUT)

Usagestructural_pangenome convert paf2delta structural_pangenome convert paf2delta FILE TARGET QUERY [-h] (-p PREFIX | -o OUTPUT)

Example 

structural_pangenome convert paf2delta pafs/assembly_1_vs_pangenome__fasta.paf fastas/assembly_1.fas pangenomes/pangenome__fasta.fas -p outputs/paf2delta

haptic examplehaptic convert paf2delta pafs/assembly_1_vs_pangenome__fasta.paf fastas/assembly_1.fas pangenomes/pangenome__fasta.fas -p outputs/paf2delta

Can be Parallelized 

Yes 

Optional 

No 


Step4: Rendering plot dotplots

Description 

OPTIONAL dotplot of the WGA 

Command 

structural_pangenome render-dotplot delta_file -o output_file

OR

structural_pangenome render-dotplot delta_file -p prefix_folder 

haptic command

haptic render-dotplot delta_file -o output_file

OR

haptic render-dotplot delta_file -p prefix_folder 

Parameters 

o: output filename (with complete path) (OUTPUT)

p: prefix_folder which stores the resulting output file (with complete path) (PREFIX)

Input 

Assembly fasta delta file (DELTA) - From Step3.

output 

·        Assembly Fasta gp file  

·        Assembly Fasta rplot file 

·        Assembly Fasta fplot file 

·        Assembly Fasta PNG file 

Usagestructural_pangenome render-dotplot [-h] (-p PREFIX | -o OUTPUT) DELTA

Example 

structural_pangenome render-dotplot Step3.deltas/assembly_1_vs_pangenome__fasta.delta -p outputs/Step4.Render

haptic examplehaptic render-dotplot deltas/assembly_1_vs_pangenome__fasta.delta -p outputs/dotplot

Can be Parallelized 

Yes 

Optional 

Yes 


Step5: Filtering Delta files

DescriptionFilter the minimap2 result (filter unwanted regions from delta file)
Commandstructural_pangenome filter-delta assembly_1_vs_pangenome__fasta.delta
Parameters

o: output filename (with complete path) (OUTPUT)

p: prefix_folder which stores the resulting output file (with complete path) (PREFIX)

m: optional, metrics: save filtering metrics to a file (name auto-generated)

--inner_cutoff: optional, (INNER_CUTOFF): Threshold to filter the delta results based on the in between two existing fragments.

default value: 0.8

--outer_cutoff: optional, (OUTER_CUTOFF): Threshold to filter the delta results based on the overlap with another fragment.

default value: 0.5

InputAssembly Delta File - From Step3.
outputFiltered Assembly Delta File
Usagestructural_pangenome filter-delta [-h] (-p PREFIX | -o OUTPUT) FILE [-INNER_CUTOFF] [-OUTER_CUTOFF] [-m]
Examplestructural_pangenome filter-delta deltas/assembly_1_vs_pangenome__fasta.delta -o filtered_deltas/ssembly_1_vs_pangenome__fasta_bbmh_filter.delta -m
Can be ParallelizedYes
OptionalNo


Step6: Rendering plot dotplots for filtered Delta files

Description 

OPTIONAL dotplot of the WGA 

Command 

structural_pangenome render-dotplot filtered_delta_file -o output_file

OR

structural_pangenome  render-dotplot filtered_delta_file -p prefix_folder

haptic command

haptic render-dotplot filtered_delta_file -o output_file

OR

haptic render-dotplot filtered_delta_file -p prefix_folder

Parameters 

o: output filename (with complete path) (PREFIX)

p: prefix_folder which stores the resulting output file (with complete path) (OUTPUT)

Input 

Assembly fasta filtered delta file (DELTA)

output 

·        Assembly Fasta gp file  

·        Assembly Fasta rplot file 

·        Assembly Fasta fplot file 

·        Assembly Fasta PNG file 

Usagestructural_pangenome render-dotplot [-h] (-p PREFIX | -o OUTPUT) DELTA

Example 

structural_pangenome render-dotplot deltas/assembly_1_vs_pangenome__fasta_bbmhFilter.delta -p outputs/dotplot

haptic examplehaptic render-dotplot deltas/assembly_1_vs_pangenome__fasta_bbmhFilter.delta -p outputs/dotplot

Can be Parallelized 

Yes 

Optional 

Yes 


Step7: Reverse Filtered Delta Files

DescriptionReverse the delta and move pangenome from query to referenceand assembly from reference to query
Commandstructural_pangenome reverse-delta assembly_1_vs_pangenome__fasta_bbmhFilter.delta -o 

pangenome__fasta_vs_assembly_1_bbmhFilter.delta

Parameters

o: output filename (with complete path) (PREFIX)

p: prefix_folder which stores the resulting output file (with complete path) (OUTPUT)

InputAssembly fasta filtered delta file (FILE)
outputReverse filtered delta file (OUTPUT)
Usagestructural_pangenome reverse-delta [-h] (-p PREFIX | -o OUTPUT) FILE
Examplestructural_pangenome reverse-delta filtered_deltas/_assembly1_v1_bbmh-filter.delta -o reversed_filtered_deltas/

pangenome__fasta_vs_assembly_1_bbmhFilter.delta

Can be parallelizedYes
OptionalNo


Step8: Coordinate Pangenome Creation

Description 

Create the coordinate pangenome system in JSON 

Command 

structural_pangenome build-coordinate reverse_filtered_delta_file -o output_file

OR

structural_pangenome build-coordinate delta_file -p prefix-folder 

haptic command

haptic build-json reverse_filtered_delta_file -o output_file

OR

haptic build-json delta_file -p prefix-folder 

Parameters 

o: output filename (with complete path)

p: prefix_folder which stores the resulting output file (with complete path)

unzipped, optional: set this flag to generated an uncompressed file (–unzipped)

pretty, optional: set this flag to generated a json file that is human readable (–pretty)

m, optional: metrics, Whether to save coordinate metrics. Filename auto-generated if not given (-m)

Input 

Reversed filtered delta file(s) (DELTA) (DELTA ..),

output 

Pangenome coordinate JSON file (OUTPUT)

Example 

structural_pangenome build-coordinate deltas/pangenome__fasta_vs_assembly_1_bbmhFilter.delta o outputs/coordinate_json.json --pretty --unzipped -m

haptic examplehaptic build-json deltas/pangenome__fasta_vs_assembly_1_bbmhFilter.delta -o outputs/coordinate-json.json
Usage

structural_pangenome build-coordinate [-h] (-p PREFIX | -o OUTPUT)  [–unzipped] [--pretty] [-m [METRICS]]

DELTA [DELTA ...]

Can be Parallelized 

Yes 

Optional 

No 


Step9: Pangenome Sequence Length Distribution

Description 

Get the sequence length distribution of the pangenome segment with coordinate in the pangenome 

Command 

structural_pangenome find-coordinate-seqlen pangenome_coordinate_file sequence_lengths_file -o output_file -g

OR

structural_pangenome find-coordinate-seqlen pangenome_coordinate_file sequence_lengths_file -p prefix_folder -g

haptic command

haptic find-seqlen reverse_filtered_delta_file -o output_file -g

OR

haptic find-seqlen reverse_filtered_delta_file -p prefix_folder -g

Parameters 

o: output filename (with complete path)

p: prefix_folder which stores the resulting output file (with complete path)

g: optional, to yield sequence length distribution graph

Input 

pangenome coordinate file (COORDINATE)

sequence lengths file (.csv) (SEQLEN)

output 

pangenome_coordinate_seqLength Graph (OUTPUT)

Usagestructural_pangenome find-coordinate-seqlen [-h] (-p PREFIX | -o OUTPUT) [-g] COORDINATE SEQLEN

Example 

structural_pangenome find-coordinate-seqlen Step8.CoordPangenome/coordAssem_pangenome.json Step2.4.seqLens/pangenome__fasta_seqlen.csv

-p outputs/seqlenPangenome -g

haptic examplehaptic find-seqlen deltas/pangenome__fasta_vs_assembly_1_bbmhFiter.delta -p outputs/seqlen

Can be Parallelized 

Yes 

Optional 

Yes 


Step10: Pangenome Path Creation

DescriptionBuilds a pangenome path json file, based on the coordinate json file
Commandstructural_pangenome build-path pangenome_coordinate_json_file -o output file
Parameters

o: output filename (with complete path)

p: prefix_folder which stores the resulting output file (with complete path)

unzipped, optional: set this flag to generated an uncompressed file (–unzipped)

pretty, optional: set this flag to generated a json file that is human readable (–pretty)

Inputpangenome coordinate JSON file (JSON)
outputpangenome path JSON FILE (OUTPUT)
Usagestructural_pangenome build-path [-h] (-p PREFIX | -o OUTPUT) [–unzipped] [–pretty] JSON
Example

structural_pangenome build-path Step8.CoordPangenome/coordAssem_pangenome.json -o 10.PathPangenome/path__pangenome.json

--unzipped --pretty

Can be ParallelizedYes
OptionalNo
  • No labels