The main problem that has to be solved for all case of manipulation of the STFT is the fact that individual signal components (sinusoids, impulses) will be spread over multiple frames and multiple STFT frequency locations (bins). This is because the STFT analysis is done using overlapping analysis windows. The windowing results in spectral leakage such that the information of individual sinusoidal components is spread over adjacent STFT bins. To avoid border effects of tapering of the analysis windows, STFT analysis windows overlap in time. This time overlap results in the fact that adjacent STFT analysis are strongly correlated (a sinusoid present in analysis frame at time "t" will be present in the subsequent frames as well). The problem of signal transformation with the phase vocoder is related to the problem that all modifications that are done in the STFT representation need to preserve the appropriate correlation between adjacent frequency bins (vertical coherence) and time frames (horizontal coherence). Besides for extremely simple synthetic sounds these appropriate correlations can only be preserved approximately and since the invention of the phase vocoder the research was mainly concerned with finding algorithms that would preserve the vertical and horizontal coherence of the STFT representation after the modification. For time scaling operations amplitude coherence is only a minor problem because shifting analysis frames in time has only a minor impact on the amplitude. The phase coherence problem has been tackled for quite a while before appropriate solutions have emerged.

