ParSe v2: predict phase-separating protein regions from the primary sequence

This is a form that allows you to enter a set of sequences and quickly find those predicted to drive phase separation. The algorithm also determines if the user-provided set is enriched or depleted in phase separation potential compared to the human proteome. Output includes a FASTA file containing the predicted phase-separating regions.

To analyze a single protein sequence using ParSe v2, please visit our other site.

The algorithm is described in Ibrahim et al, J. Biol. Chem. 299, 102801 (2023).

This web application and its output is described in Wilson et al, Protein Science, doi: 10.1002/pro.4756.

Input File

Input a FASTA file containing protein sequences, preferably in the UniProtKB format. Files with tens of thousands of sequences may take a minute or longer to analyze. The algorithm ignores sequences longer than 10,000 residues or shorter than 25 residues.

File:

Name for the file (optional):

A

A. Distribution of predicted PS IDR lengths. ParSe v2 finds regions within proteins that are >90% labeled P, which are predicted to be PS IDRs. The y-axis is the percent of proteins in a set with PS IDRs at least as long as the length indicated by the x-axis. The recall plot compares the percent of set for the human proteome against itself and against the input file.

B

B. Distribution of PS potentials. ParSe v2 calculates the classifier distance sum of windows labeled P, which we use as a numerical score to estimate the PS potential. The y-axis is the percent of proteins in a set with a PS potential equal to or greater than the value indicated by the x-axis. The recall plot compares the percent of set for the human proteome against itself and against the input file.

Sequence-calculated properties (sort this table by its 3^rd or 4^th column to find phase-separating proteins):

Plot A percent of set and recall data:

Plot B percent of set and recall data:

Predicted PS IDRs (from input file, those w/ N ≥50):