Job TreeNavigate the job tree to view your child job details
Loading job tree...
A comprehensive solution for video lipsyncing with a suite of different model and enhancements options.
Code
ready
Outputs
waiting for outputs
Logs
listening for logs...
README

Lipsync

A comprehensive solution for video lipsyncing with a suite of different models and enhancement options.

Available backends include:

  • Sync 1.9.0 Beta: This backend uses the 1.9.0 model from Sync.

  • Sievesync 1.1: This backend uses the latest state-of-the-art LatentSync Model, combining it with LivePortrait for higher quality sync and Codeformer for face enhancement.

  • Latentsync: This backend uses the Latentsync model.

  • SieveSync: This backend uses a proprietary alignment technique with optimized MuseTalk and LivePortrait for faster inference and better sync with the audio. Videos without many motion/scene cuts work best with this backend.

  • MuseTalk: This backend uses the MuseTalk model combined with CodeFormer (optional but recommended) to sync the lips in the driver video/image with the provided audio and restore the face.

  • Video Retalking: This backend uses the Video Retalking model combined with GPEN and GFPGAN to sync the lips in the driver video/image with the provided audio.

For pricing, click here.

For examples, click here.

For tips to ensure better performance, click here.

Ethical Considerations

Lipsync technologies come with social risks, particularly the potential for misuse in creating deepfakes. To mitigate these risks, it’s crucial to follow ethical guidelines and adopt responsible usage practices. Currently, the synthesized results contain visual artifacts that may help in detecting deepfakes as well as watermarks that identify the use of Sieve. Please note that we do not assume any legal responsibility for the use of the results generated by this app.

Please reach out to us at sales@sievedata.com or via Discord if you have any questions or concerns or if you want to request a watermark removal.

Important Notes:

  • latentsync and sievesync-1.1 work best when every frame has a face.
  • In case of any errors, the backend is automatically switched to sievesync.
  • The Multi-speaker boolean uses Sieve's Active Speaker Detection to determine which speaker is speaking at any given time. This is not always reliable and may not work for all videos.
  • Enhance applies restoration to the face only and does not affect the resolution of the video.
  • No enhancement is applied for Sync 1.9.0 Beta.
  • The processing time depends on video resolution and video length along with the amount of time a valid speaker is detected.
  • Sync 1.9.0 Beta is preferred for overall sync.
  • SieveSync 1.1 is preferred for better face fidelity.
  • SieveSync is a custom backend that combines multiple models, running at 25 FPS with high face fidelity and good lip movement.

Tips for better performance:

  • Ensure there is only a single primary speaker in the video
  • Ensure the person is facing the camera
  • Ensure the person is not wearing any accessories that cover the mouth (e.g. mask, scarf, etc.)
  • Ensure the person is not moving their head too much
  • Ensure the person's face is not very small in the frame
  • The MuseTalk and SieveSync backends may perform unreliably in case the person has a lot of facial hair
  • Downsampling to 720p can help decrease processing times and artifacts in unstable videos which can be enabled by setting downsample to true

Information on the cut_by parameter:

  • The duration of the audio file always supersedes the duration of the video file.
  • When audio is selected as the input and the video is shorter than the audio, the video is played until the end then played backward to the start, and so on until it meets the duration of the audio.
  • When video is selected as the input and the video is shorter than the audio, the audio is cut off when the video ends.
  • When shortest is selected, the file with the shorter duration between the two decides the duration, and the files are cut off accordingly.

Pricing

BackendEnhancePrice per Minute
Sync 1.9.0 BetaN/A$1.50
Up to 35% usage discounts available.
Reach out to sales@sievedata.com for monthly and enterprise plans!
Sievesync 1.1True$0.60
False$0.45
LatentsyncTrue$0.475
False$0.325
SieveSyncTrue$0.50
False$0.35
MuseTalkTrue$0.35
False$0.20
Video RetalkingTrue$0.45
False$0.30

Notes:

  • Discounts are available for high volume users. Please reach out to sales@sievedata.com or via Discord for more information.
  • If enable_multispeaker is set to true, there will be an additional charge of $0.065 per minute.
  • Any content above 1080p will be downsampled to 1080p
  • The "Enhance" option applies additional processing for improved quality
  • Prices are subject to change. Please refer to our latest documentation for the most up-to-date pricing information.

Examples

Works best on a computer or in landscape

Driving VideoDriving AudioOutputBackendEnhance PriceSieve Job
SieveSync True $1.42 Here
SieveSync False $0.1 Here
MuseTalk True $0.122 Here
Video Retalking True $0.07 Here