ParSe v2: predict phase-separating protein regions from the primary sequence
This is a form that allows you to enter a sequence and predict the regions in a protein that are disordered, and which subset of those regions can undergo phase separation.
To analyze sequence sets using ParSe v2, please visit our other site.
For smaller sequence sets, a third site outputs the predicted phase separation potential, including the protein-protein interactions terms Uπ and Uq, which we have found helpful for screening related sequences.
Description
ParSe v2 is an update to the original ParSe algorithm.
ParSe v2 explores the possibility that protein mediated phase separation can be predicted from sequence-based calculations of hydrophobicity, α-helix propensity, and a model of the polymer scaling exponent (νmodel). Using these three factors, any protein sequence can be parsed into one of three categories:
- P Regions (P) are intrinsically disordered and prone to undergo phase separation.
- D Regions (D) are intrinsically disordered but do not undergo phase separation.
- F Regions (F) may or may not be intrinsically disordered, but can fold to a stable conformation.
Reference
- Ibrahim, A.Y., Khaodeuanepheng, N.P., Amarasekara, D.L., Correia, J.J., Lewis, K.A., Fitzkee, N.C., Hough, L.E., Whitten, S.T. “Intrinsically disordered regions that drive phase separation form a robustly distinct protein class” J. Biol. Chem. 299, 102801 (2023). https://doi.org/10.1016/j.jbc.2022.102801
Primary Sequence
Maximum sequence length that can be analyzed is 10,000 residues; minimum length is 25. Amino acids are restricted to the 20 common types.
Sequence length: Σ classifier distance of P-labeled windows: Σ classifier distance of P-labeled windows + Uπ + Uq (trained using ∆h°): Σ classifier distance of P-labeled windows + Uπ + Uq (trained using csat at 4°C):
ParSe Results
Protein Regions
Identified regions have 20 or more contiguous residues that are at least 90% of only one label: P, D, or F.