Commit Graph

955 Commits

Author SHA1 Message Date
comfyanonymous 8edbcf5209 Improve performance on some lowend GPUs. 2024-08-05 16:24:04 -04:00
a-One-Fan a178e25912
Fix Flux FP64 math on XPU (#4210) 2024-08-05 01:26:20 -04:00
comfyanonymous 78e133d041 Support simple diffusers Flux loras. 2024-08-04 22:05:48 -04:00
Silver 7afa985fba
Correct spelling 'token_weight_pars_t5' to 'token_weight_pairs_t5' (#4200) 2024-08-04 17:10:02 -04:00
comfyanonymous 3b71f84b50 ONNX tracing fixes. 2024-08-04 15:45:43 -04:00
comfyanonymous 0a6b008117 Fix issue with some custom nodes. 2024-08-04 10:03:33 -04:00
comfyanonymous f7a5107784 Fix crash. 2024-08-03 16:55:38 -04:00
comfyanonymous 91be9c2867 Tweak lowvram memory formula. 2024-08-03 16:44:50 -04:00
comfyanonymous 03c5018c98 Lower lowvram memory to 1/3 of free memory. 2024-08-03 15:14:07 -04:00
comfyanonymous 2ba5cc8b86 Fix some issues. 2024-08-03 15:06:40 -04:00
comfyanonymous 1e68002b87 Cap lowvram to half of free memory. 2024-08-03 14:50:20 -04:00
comfyanonymous ba9095e5bd Automatically use fp8 for diffusion model weights if:
Checkpoint contains weights in fp8.

There isn't enough memory to load the diffusion model in GPU vram.
2024-08-03 13:45:19 -04:00
comfyanonymous f123328b82 Load T5 in fp8 if it's in fp8 in the Flux checkpoint. 2024-08-03 12:39:33 -04:00
comfyanonymous 63a7e8edba More aggressive batch splitting. 2024-08-03 11:53:30 -04:00
comfyanonymous ea03c9dcd2 Better per model memory usage estimations. 2024-08-02 18:09:24 -04:00
comfyanonymous 3a9ee995cf Tweak regular SD memory formula. 2024-08-02 17:34:30 -04:00
comfyanonymous 47da42d928 Better Flux vram estimation. 2024-08-02 17:02:35 -04:00
Alexander Brown ce9ac2fe05
Fix clip_g/clip_l mixup (#4168) 2024-08-01 21:40:56 -04:00
comfyanonymous e638f2858a Hack to make all resolutions work on Flux models. 2024-08-01 21:39:18 -04:00
comfyanonymous d420bc792a Tweak the memory usage formulas for Flux and SD. 2024-08-01 17:53:45 -04:00
comfyanonymous d965474aaa Make ComfyUI split batches a higher priority than weight offload. 2024-08-01 16:39:59 -04:00
comfyanonymous 1c61361fd2 Fast preview support for Flux. 2024-08-01 16:28:11 -04:00
comfyanonymous a6decf1e62 Fix bfloat16 potentially not being enabled on mps. 2024-08-01 16:18:44 -04:00
comfyanonymous 48eb1399c0 Try to fix mac issue. 2024-08-01 13:41:27 -04:00
comfyanonymous d7430a1651 Add a way to load the diffusion model in fp8 with UNETLoader node. 2024-08-01 13:30:51 -04:00
comfyanonymous f2b80f95d2 Better Mac support on flux model. 2024-08-01 13:10:50 -04:00
comfyanonymous 1aa9cf3292 Make lowvram more aggressive on low memory machines. 2024-08-01 12:11:57 -04:00
comfyanonymous eb96c3bd82 Fix .sft file loading (they are safetensors files). 2024-08-01 11:32:58 -04:00
comfyanonymous 5f98de7697 Load flux t5 in fp8 if weights are in fp8. 2024-08-01 11:05:56 -04:00
comfyanonymous 8d34211a7a Fix old python versions no longer working. 2024-08-01 09:57:20 -04:00
comfyanonymous 1589b58d3e Basic Flux Schnell and Flux Dev model implementation. 2024-08-01 09:49:29 -04:00
comfyanonymous 7ad574bffd Mac supports bf16 just make sure you are using the latest pytorch. 2024-08-01 09:42:17 -04:00
comfyanonymous e2382b6adb Make lowvram less aggressive when there are large amounts of free memory. 2024-08-01 03:58:58 -04:00
comfyanonymous c24f897352 Fix to get fp8 working on T5 base. 2024-07-31 02:00:19 -04:00
comfyanonymous a5991a7aa6 Fix hunyuan dit text encoder weights always being in fp32. 2024-07-31 01:34:57 -04:00
comfyanonymous 2c038ccef0 Lower CLIP memory usage by a bit. 2024-07-31 01:32:35 -04:00
comfyanonymous b85216a3c0 Lower T5 memory usage by a few hundred MB. 2024-07-31 00:52:34 -04:00
comfyanonymous 82cae45d44 Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
comfyanonymous 25853d0be8 Use common function for casting weights to input. 2024-07-30 10:49:14 -04:00
comfyanonymous 79040635da Remove unnecessary code. 2024-07-30 05:01:34 -04:00
comfyanonymous 66d35c07ce Improve artifacts on hydit, auraflow and SD3 on specific resolutions.
This breaks seeds for resolutions that are not a multiple of 16 in pixel
resolution by using circular padding instead of reflection padding but
should lower the amount of artifacts when doing img2img at those
resolutions.
2024-07-29 20:48:50 -04:00
comfyanonymous 4ba7fa0244 Refactor: Move sd2_clip.py to text_encoders folder. 2024-07-28 01:19:20 -04:00
comfyanonymous cf4418b806 Don't treat Bert model like CLIP.
Bert can accept up to 512 tokens so any prompt with more than 77 should
just be passed to it as is instead of splitting it up like CLIP.
2024-07-26 13:08:12 -04:00
comfyanonymous 8328a2d8cd Let hunyuan dit work with all prompt lengths. 2024-07-26 12:11:32 -04:00
comfyanonymous afe732bef9 Hunyuan dit can now accept longer prompts. 2024-07-26 11:52:58 -04:00
comfyanonymous a9ac56fc0d Own BertModel implementation that works with lowvram. 2024-07-26 04:47:17 -04:00
comfyanonymous 25b51b1a8b Hunyuan DiT lora support. 2024-07-25 22:42:54 -04:00
comfyanonymous a5f4292f9f
Basic hunyuan dit implementation. (#4102)
* Let tokenizers return weights to be stored in the saved checkpoint.

* Basic hunyuan dit implementation.

* Fix some resolutions not working.

* Support hydit checkpoint save.

* Init with right dtype.

* Switch to optimized attention in pooler.

* Fix black images on hunyuan dit.
2024-07-25 18:21:08 -04:00
comfyanonymous f87810cd3e Let tokenizers return weights to be stored in the saved checkpoint. 2024-07-25 10:52:09 -04:00
comfyanonymous 10c919f4c7 Make it possible to load tokenizer data from checkpoints. 2024-07-24 16:43:53 -04:00