Job TreeNavigate the job tree to view your child job details

Loading job tree...

public

A comprehensive solution for video lipsyncing with a suite of different model and enhancements options.

Code

ready

Outputs

waiting for outputs

Logs

listening for logs...

README

Lipsync

A comprehensive solution for video lipsyncing with a suite of different models and enhancement options.

Available backends include:

Sync 2.0: This backend uses the latest model from Sync.
Sync 1.9.0 Beta: This backend uses the 1.9.0 model from Sync.
Sievesync 1.1: This backend uses the latest state-of-the-art LatentSync Model, combining it with LivePortrait for higher quality sync and Codeformer for face enhancement.
Hummingbird: This backend uses the latest model from Tavus.
Latentsync: This backend uses the Latentsync model.
SieveSync: This backend uses a proprietary alignment technique with optimized MuseTalk and LivePortrait for faster inference and better sync with the audio. Videos without many motion/scene cuts work best with this backend.
MuseTalk: This backend uses the MuseTalk model combined with CodeFormer (optional but recommended) to sync the lips in the driver video/image with the provided audio and restore the face.
Video Retalking: This backend uses the Video Retalking model combined with GPEN and GFPGAN to sync the lips in the driver video/image with the provided audio.

For pricing, click here.

For examples, click here.

For tips to ensure better performance, click here.

Ethical Considerations

Lipsync technologies come with social risks, particularly the potential for misuse in creating deepfakes. To mitigate these risks, it's crucial to follow ethical guidelines and adopt responsible usage practices. Currently, the synthesized results contain visual artifacts that may help in detecting deepfakes as well as watermarks that identify the use of Sieve. Please note that we do not assume any legal responsibility for the use of the results generated by this app.

Please reach out to us at sales@sievedata.com or via Discord if you have any questions or concerns or if you want to request a watermark removal.

Important Notes

Sync 2.0 and Hummingbird are preferred for overall sync.
SieveSync 1.1 is preferred for better face fidelity for a lower price.
SieveSync is a custom backend that combines multiple models, running at 25 FPS with high face fidelity and good lip movement.
No enhancement is applied for Sync 1.9.0 Beta and Sync 2.0.
Latentsync and SieveSync-1.1 work best when every frame has a face.
The Multi-speaker boolean uses Sieve's Active Speaker Detection to determine which speaker is speaking at any given time. This is not always reliable and may not work for all videos.
Enhance applies restoration to the face only and does not affect the resolution of the video.
The processing time depends on video resolution and video length along with the amount of time a valid speaker is detected.
The default max allowed for new users is 30 seconds.

Tips for better performance:

Ensure there is only a single primary speaker in the video
Ensure the person is facing the camera
Ensure the person is not wearing any accessories that cover the mouth (e.g. mask, scarf, etc.)
Ensure the person is not moving their head too much
Ensure the person's face is not very small in the frame
The MuseTalk and SieveSync backends may perform unreliably in case the person has a lot of facial hair
Downsampling to 720p can help decrease processing times and artifacts in unstable videos which can be enabled by setting downsample to true

Information on the check_quality parameter:

This is a boolean parameter that checks the visual quality of the output video. If an output video fails the quality check, it will be rejected and the function will raise an exception.
This feature is priced at $0.01 per video.
It only checks for major visual quality issues such as very prominent visual artifacts, noise, encoding artifacts, etc. It does not check for minor issues such as low face quality or low lip sync.
This feature is not available for image input.

Information on the cut_by parameter:

The duration of the audio file always supersedes the duration of the video file.
When audio is selected as the input and the video is shorter than the audio, the video is played until the end then played backward to the start, and so on until it meets the duration of the audio.
When video is selected as the input and the video is shorter than the audio, the audio is cut off when the video ends.
When shortest is selected, the file with the shorter duration between the two decides the duration, and the files are cut off accordingly.

Pricing

Backend	Enhance	Price per Minute
Sync 2.0	N/A	$3.00 Up to 35% usage discounts available. Reach out to sales@sievedata.com for monthly and enterprise plans!
Sync 1.9.0 Beta	N/A	$1.50 Up to 35% usage discounts available. Reach out to sales@sievedata.com for monthly and enterprise plans!
Hummingbird	N/A	$2.10 Discounts available! Reach out to sales@sievedata.com for monthly and enterprise plans!
Sievesync 1.1	True	$0.60
	False	$0.45
Latentsync	True	$0.475
	False	$0.325
SieveSync	True	$0.50
	False	$0.35
MuseTalk	True	$0.35
	False	$0.20
Video Retalking	True	$0.45
	False	$0.30

Notes:

Discounts are available for high volume users. Please reach out to sales@sievedata.com or via Discord for more information.
If enable_multispeaker is set to true, there will be an additional charge of $0.065 per minute.
Any content above 1080p will be downsampled to 1080p
The "Enhance" option applies additional processing for improved quality
The check_quality parameter is priced at $0.01 per video
Prices are subject to change. Please refer to our latest documentation for the most up-to-date pricing information.

Examples

Works best on a computer or in landscape

Driving Video	Driving Audio	Output	Backend	Enhance	Price	Sieve Job
			SieveSync	True	$1.42	Here
			SieveSync	False	$0.1	Here
			MuseTalk	True	$0.122	Here
			Video Retalking	True	$0.07	Here

MORE EXAMPLES

See more examples of this app by clicking on the jobs below.

9ea77af1-518e-4c1c-aae6-0feca93b56dd