Note (Help Messages): a -h flag after a command (e.g. >structural_pangenome find-ngaps -h) generates a help message for said command.
Step1: Ngap detection
Description | Report the stretch of Ns (Ngap) in each assembly |
Command | structural_pangenome find-ngaps assembly_fasta_file -o output_file OR structural_pangenome find-ngaps assembly_fasta_file -p prefix_folder |
haptic command | haptic find-ngaps assembly_fasta_file_file -o output_file OR haptic find-ngaps assembly_fasta_file_file -p prefix_folder |
Parameters | o: output filename (with complete path) p: prefix_folder which stores the resulting output file. t: optional parameter, threshold: minimum recurrence of Ngap sequence length that affects statistics (integer) |
Input | Fasta file for each assembly (FILE) |
output | Ngap.bed file for each assembly (OUTPUT) |
Usage | structural_pangenome find-ngaps [-h] (-p PREFIX | -o OUTPUT) [-t THRESHOLD] FILE |
Example | structural_pangenome find-ngaps fastas/assembly_1.fas -o outputs/1.Ngaps/resulting_ngaps.bed |
haptic example | haptic find-ngaps fastas/assembly_1.fas -o outputs/resulting_ngaps.bed |
Can be Parallelized | Yes |
Optional | No |
Step2.1: Indexing
Description | Indexing the assembly fasta file |
Command | structural_pangenome index assembly_fasta_file -o output_file OR structural_pangenome index assembly_fasta_file -p prefix_folder |
haptic command | haptic index assembly_fasta_file -o output_file OR haptic index assembly_fasta_file -p prefix_folder |
Parameters | o: output filename (with complete path) p: prefix_folder which stores the resulting output file. |
Input | Fasta file for each assembly (FILE) |
output | Indexed fasta file for each assembly (OUTPUT) |
Usage | structural_pangenome index [-h] (-p PREFIX | -o OUTPUT) FILE |
Example | structural_pangenome index fastas/assembly_1.fas -p outputs/2.1.indexed_fasta |
haptic example | haptic index fastas/assembly_1.fas -p outputs/2.1.indexed_fasta/ |
Can be Parallelized | Yes |
Optional | No |
Step2.2: Iterative Pangenome graph build
Description | Building the pangenome graph using minigraph iteratively. You need to define the order in which the assemblies will be processed. First, you need to define which assembly is better to start with. Then other genomes need to be ranked according to the completeness of the assembly and the distance relative to the first chosen assembly (from closest to farthest). |
Command | structural_pangenome build-graph assembly1_fasta_file assembly2_fasta_file -o assembly1__assembly2_graph_filename OR structural_pangenome build-graph assembly1_fasta_file assembly2_fasta_file -p folder_to_populate_output |
haptic command | haptic build-graph assembly1_fasta_file assembly2_fasta_file -o assembly1__assembly2_graph_filename OR haptic build-graph assembly1_fasta_file assembly2_fasta_file -p folder_to_populate_output |
Parameters | o: output filename (with complete path) p: prefix_folder which stores the resulting output file (with complete path) |
Input | Fasta file for each assembly (FASTA), (and possibly gfa file to add - GRAPH) |
output | One gfa file for all assemblies (OUTPUT) |
Usage | structural_pangenome build-graph [-h] (-p PREFIX | -o OUTPUT) GRAPH FASTA |
Example | structural_pangenome build-graph fastas/assembly_1.fas fastas/assembly_2.fas -p outputs/2.2gfas/assembly_pangenome_graph |
haptic example | haptic build-graph fastas/assembly_1.fas fastas/assembly_2.fas -p outputs/assembly_pangenome_graph |
Parallelizable | No |
Optional | No |
Step2.3: Graph & stat
Description | Extract from minigraph pangenome all fragmented sequences in a FASTA file |
Command | structural_pangenome convert gfa2fa assembly_1_2_3.gfa > pangenome__fasta.fas |
Parameters | gfa2fa: convert GFA file to FASTA file |
Input | One gfa file for all assemblies (GFA) |
Output | One FASTA file for the whole pangenome (OUTPUT) |
Usage | structural_pangenome convert gfa2fa GFA > OUTPUT |
Example | structural_pangenome convert gfa2fa 2.2.gfas/assembly123.gfa > 2.3.Stat/pangenome__fasta.fas |
Parallelizable | No |
Optional | No |
Step2.4: Pangenome Sequence Length distribution
Description | Get the pangenome sequence length distribution |
Command | structural_pangenome find-seqlen pengenome_fasta_file -o output_file OR structural_pangenome find-seqlen pengenome_fasta_file -p prefix_folder |
haptic command | haptic find-seqlen pengenome_fasta_file -o output_file OR haptic find-seqlen pengenome_fasta_file -p prefix_folder |
Parameters | o: output filename (with complete path) (OUTPUT) p: prefix_folder which stores the resulting output file (with complete path) (PREFIX) g: optional parameter which also displays a graph of the sequence length distribution (-g) |
Input | Fasta file for each assembly (FILE) |
output | One csv file for the whole Pangenome (OUTPUT) |
Usage | structural_pangenome find-seqlen [-h] (-p PREFIX | -o OUTPUT) [-g] FILE |
Example | structural_pangenome find-seqlen pangenomes/pangenome__fasta.fas -o outputs/2.4.PangSeqLen/pangenome_seqlen.csv -g |
haptic example | haptic find-seqlen pangenomes/pangenome__fasta.fas -o outputs/pangenome_seqlen.csv |
Can be Parallelized | No |
Optional | Yes |
Step2.5: Assembly vs pangenome
Description | Compare Assembly vs pangenome |
Command | structural_pangenome indexed_assembly_fasta_file pangenome_fasta_file -o output_file OR structural_pangenome align indexed_assembly_fasta_file pangenome_fasta_file -p prefix_folder |
haptic command | haptic align indexed_assembly_fasta_file pangenome_fasta_file -o output_file OR haptic align indexed_assembly_fasta_file pangenome_fasta_file -p prefix_folder |
Parameters | o: output filename (with complete path) p: prefix_folder which stores the resulting output file (with complete path) |
Input | Indexed fasta file for each assembly (TARGET) - From Step2.1, and the Pangenome Fasta file (QUERY) - From Step2.3 |
output | Pangenome paf file (OUTPUT) |
Usage | structural_pangenome align [-h] (-p PREFIX | -o OUTPUT) TARGET QUERY |
Example | structural_pangenome align fastas/assembly_1_indexed.fas pangenome/pangenome__fasta.fas -o outputs/output_pangenome.paf |
haptic example | haptic align fastas/assembly_1_indexed.fas pangenome/pangenome__fasta.fas -o outputs/output_pangenome.paf |
Can be Parallelized | Yes |
Optional | No |
Step3: Convert the PAF file into a delta file
Description | Convert the PAF file into a delta file. |
Command | structural_pangenome convert paf2delta paf_file assembly__fasta pangenome__fasta -o output_file OR structural_pangenome convert paf2delta paf_file assembly__fasta pangenome__fasta -p prefix_folder |
haptic command | haptic convert paf2delta paf_file assembly__fasta pangenome__fasta -o output_file OR haptic convert paf2delta paf_file assembly__fasta pangenome__fasta -p prefix_folder |
Parameters | o: output filename (with complete path) p: prefix_folder which stores the resulting output file (with complete path) |
Input |
|
output | Assembly Delta File (OUTPUT) |
Usage | structural_pangenome convert paf2delta structural_pangenome convert paf2delta FILE TARGET QUERY [-h] (-p PREFIX | -o OUTPUT) |
Example | structural_pangenome convert paf2delta pafs/assembly_1_vs_pangenome__fasta.paf fastas/assembly_1.fas pangenomes/pangenome__fasta.fas -p outputs/paf2delta |
haptic example | haptic convert paf2delta pafs/assembly_1_vs_pangenome__fasta.paf fastas/assembly_1.fas pangenomes/pangenome__fasta.fas -p outputs/paf2delta |
Can be Parallelized | Yes |
Optional | No |
Step4: Rendering plot dotplots
Description | OPTIONAL dotplot of the WGA |
Command | structural_pangenome render-dotplot delta_file -o output_file OR structural_pangenome render-dotplot delta_file -p prefix_folder |
haptic command | haptic render-dotplot delta_file -o output_file OR haptic render-dotplot delta_file -p prefix_folder |
Parameters | o: output filename (with complete path) (OUTPUT) p: prefix_folder which stores the resulting output file (with complete path) (PREFIX) |
Input | Assembly fasta delta file (DELTA) - From Step3. |
output | · Assembly Fasta gp file · Assembly Fasta rplot file · Assembly Fasta fplot file · Assembly Fasta PNG file |
Usage | structural_pangenome render-dotplot [-h] (-p PREFIX | -o OUTPUT) DELTA |
Example | structural_pangenome render-dotplot Step3.deltas/assembly_1_vs_pangenome__fasta.delta -p outputs/Step4.Render |
haptic example | haptic render-dotplot deltas/assembly_1_vs_pangenome__fasta.delta -p outputs/dotplot |
Can be Parallelized | Yes |
Optional | Yes |
Step5: Filtering Delta files
Description | Filter the minimap2 result (filter unwanted regions from delta file) |
Command | structural_pangenome filter-delta assembly_1_vs_pangenome__fasta.delta |
Parameters | o: output filename (with complete path) (OUTPUT) p: prefix_folder which stores the resulting output file (with complete path) (PREFIX) m: optional, metrics: save filtering metrics to a file (name auto-generated) --inner_cutoff: optional, (INNER_CUTOFF): Threshold to filter the delta results based on the in between two existing fragments. default value: 0.8 --outer_cutoff: optional, (OUTER_CUTOFF): Threshold to filter the delta results based on the overlap with another fragment. default value: 0.5 |
Input | Assembly Delta File - From Step3. |
output | Filtered Assembly Delta File |
Usage | structural_pangenome filter-delta [-h] (-p PREFIX | -o OUTPUT) FILE [-INNER_CUTOFF] [-OUTER_CUTOFF] [-m] |
Example | structural_pangenome filter-delta deltas/assembly_1_vs_pangenome__fasta.delta -o filtered_deltas/ssembly_1_vs_pangenome__fasta_bbmh_filter.delta -m |
Can be Parallelized | Yes |
Optional | No |
Step6: Rendering plot dotplots for filtered Delta files
Description | OPTIONAL dotplot of the WGA |
Command | structural_pangenome render-dotplot filtered_delta_file -o output_file OR structural_pangenome render-dotplot filtered_delta_file -p prefix_folder |
haptic command | haptic render-dotplot filtered_delta_file -o output_file OR haptic render-dotplot filtered_delta_file -p prefix_folder |
Parameters | o: output filename (with complete path) (PREFIX) p: prefix_folder which stores the resulting output file (with complete path) (OUTPUT) |
Input | Assembly fasta filtered delta file (DELTA) |
output | · Assembly Fasta gp file · Assembly Fasta rplot file · Assembly Fasta fplot file · Assembly Fasta PNG file |
Usage | structural_pangenome render-dotplot [-h] (-p PREFIX | -o OUTPUT) DELTA |
Example | structural_pangenome render-dotplot deltas/assembly_1_vs_pangenome__fasta_bbmhFilter.delta -p outputs/dotplot |
haptic example | haptic render-dotplot deltas/assembly_1_vs_pangenome__fasta_bbmhFilter.delta -p outputs/dotplot |
Can be Parallelized | Yes |
Optional | Yes |
Step7: Reverse Filtered Delta Files
Description | y |
Command | structural_pangenome reverse-delta assembly_1_vs_pangenome__fasta_bbmhFilter.delta -o pangenome__fasta_vs_assembly_1_bbmhFilter.delta |
Parameters | o: output filename (with complete path) (PREFIX) p: prefix_folder which stores the resulting output file (with complete path) (OUTPUT) |
Input | Assembly fasta filtered delta file (FILE) |
output | Reverse filtered delta file (OUTPUT) |
Usage | structural_pangenome reverse-delta [-h] (-p PREFIX | -o OUTPUT) FILE |
Example | structural_pangenome reverse-delta filtered_deltas/_assembly1_v1_bbmh-filter.delta -o reversed_filtered_deltas/ pangenome__fasta_vs_assembly_1_bbmhFilter.delta |
Can be parallelized | Yes |
Optional | No |
Step8: Coordinate Pangenome Creation
Description | Create the coordinate pangenome system in JSON |
Command | structural_pangenome build-coordinate reverse_filtered_delta_file -o output_file OR structural_pangenome build-coordinate delta_file -p prefix-folder |
haptic command | haptic build-json reverse_filtered_delta_file -o output_file OR haptic build-json delta_file -p prefix-folder |
Parameters | o: output filename (with complete path) p: prefix_folder which stores the resulting output file (with complete path) unzipped, optional: set this flag to generated an uncompressed file (–unzipped) pretty, optional: set this flag to generated a json file that is human readable (–pretty) m, optional: metrics, Whether to save coordinate metrics. Filename auto-generated if not given (-m) |
Input | Reversed filtered delta file(s) (DELTA) (DELTA ..), |
output | Pangenome coordinate JSON file (OUTPUT) |
Example | structural_pangenome build-coordinate deltas/pangenome__fasta_vs_assembly_1_bbmhFilter.delta o outputs/coordinate_json.json --pretty --unzipped -m |
haptic example | haptic build-json deltas/pangenome__fasta_vs_assembly_1_bbmhFilter.delta -o outputs/coordinate-json.json |
Usage | structural_pangenome build-coordinate [-h] (-p PREFIX | -o OUTPUT) [–unzipped] [--pretty] [-m [METRICS]] DELTA [DELTA ...] |
Can be Parallelized | Yes |
Optional | No |
Step9: Pangenome Sequence Length Distribution
Description | Get the sequence length distribution of the pangenome segment with coordinate in the pangenome |
Command | structural_pangenome find-coordinate-seqlen pangenome_coordinate_file sequence_lengths_file -o output_file -g OR structural_pangenome find-coordinate-seqlen pangenome_coordinate_file sequence_lengths_file -p prefix_folder -g |
haptic command | haptic find-seqlen reverse_filtered_delta_file -o output_file -g OR haptic find-seqlen reverse_filtered_delta_file -p prefix_folder -g |
Parameters | o: output filename (with complete path) p: prefix_folder which stores the resulting output file (with complete path) g: optional, to yield sequence length distribution graph |
Input | pangenome coordinate file (COORDINATE) sequence lengths file (.csv) (SEQLEN) |
output | pangenome_coordinate_seqLength Graph (OUTPUT) |
Usage | structural_pangenome find-coordinate-seqlen [-h] (-p PREFIX | -o OUTPUT) [-g] COORDINATE SEQLEN |
Example | structural_pangenome find-coordinate-seqlen Step8.CoordPangenome/coordAssem_pangenome.json Step2.4.seqLens/pangenome__fasta_seqlen.csv -p outputs/seqlenPangenome -g |
haptic example | haptic find-seqlen deltas/pangenome__fasta_vs_assembly_1_bbmhFiter.delta -p outputs/seqlen |
Can be Parallelized | Yes |
Optional | Yes |
Step10: Pangenome Path Creation
Description | Builds a pangenome path json file, based on the coordinate json file |
Command | structural_pangenome build-path pangenome_coordinate_json_file -o output file |
Parameters | o: output filename (with complete path) p: prefix_folder which stores the resulting output file (with complete path) unzipped, optional: set this flag to generated an uncompressed file (–unzipped) pretty, optional: set this flag to generated a json file that is human readable (–pretty) |
Input | pangenome coordinate JSON file (JSON) |
output | pangenome path JSON FILE (OUTPUT) |
Usage | structural_pangenome build-path [-h] (-p PREFIX | -o OUTPUT) [–unzipped] [–pretty] JSON |
Example | structural_pangenome build-path Step8.CoordPangenome/coordAssem_pangenome.json -o 10.PathPangenome/path__pangenome.json --unzipped --pretty |
Can be Parallelized | Yes |
Optional | No |