Commit Graph

143 Commits

Author SHA1 Message Date
comfyanonymous 5d8898c056 Fix some performance issues with weight loading and unloading.
Lower peak memory usage when changing model.

Fix case where model weights would be unloaded and reloaded.
2024-03-28 18:04:42 -04:00
comfyanonymous c6de09b02e Optimize memory unload strategy for more optimized performance. 2024-03-24 02:36:30 -04:00
comfyanonymous 4b9005e949 Fix regression with model merging. 2024-03-20 13:56:12 -04:00
comfyanonymous c18a203a8a Don't unload model weights for non weight patches. 2024-03-20 02:27:58 -04:00
comfyanonymous db8b59ecff Lower memory usage for loras in lowvram mode at the cost of perf. 2024-03-13 20:07:27 -04:00
comfyanonymous 0ed72befe1 Change log levels.
Logging level now defaults to info. --verbose sets it to debug.
2024-03-11 13:54:56 -04:00
comfyanonymous 65397ce601 Replace prints with logging and add --verbose argument. 2024-03-10 12:14:23 -04:00
comfyanonymous dce3555339 Add some tesla pascal GPUs to the fp16 working but slower list. 2024-03-02 17:16:31 -05:00
comfyanonymous 88f300401c Enable fp16 by default on mps. 2024-02-19 12:00:48 -05:00
comfyanonymous 929e266f3e Manual cast for bf16 on older GPUs. 2024-02-17 09:01:17 -05:00
comfyanonymous 0b3c50480c Make --force-fp32 disable loading models in bf16. 2024-02-16 23:01:54 -05:00
comfyanonymous f83109f09b Stable Cascade Stage C. 2024-02-16 10:55:08 -05:00
comfyanonymous aeaeca10bd Small refactor of is_device_* functions. 2024-02-15 21:10:10 -05:00
comfyanonymous 66e28ef45c Don't use is_bf16_supported to check for fp16 support. 2024-02-04 20:53:35 -05:00
comfyanonymous 24129d78e6 Speed up SDXL on 16xx series with fp16 weights and manual cast. 2024-02-04 13:23:43 -05:00
comfyanonymous 4b0239066d Always use fp16 for the text encoders. 2024-02-02 10:02:49 -05:00
comfyanonymous f9e55d8463 Only auto enable bf16 VAE on nvidia GPUs that actually support it. 2024-01-15 03:10:22 -05:00
comfyanonymous 1b103e0cb2 Add argument to run the VAE on the CPU. 2023-12-30 05:49:07 -05:00
comfyanonymous e1e322cf69 Load weights that can't be lowvramed to target device. 2023-12-28 21:41:10 -05:00
comfyanonymous a252963f95 --disable-smart-memory now unloads everything like it did originally. 2023-12-23 04:25:06 -05:00
comfyanonymous 36a7953142 Greatly improve lowvram sampling speed by getting rid of accelerate.
Let me know if this breaks anything.
2023-12-22 14:38:45 -05:00
comfyanonymous 2f9d6a97ec Add --deterministic option to make pytorch use deterministic algorithms. 2023-12-17 16:59:21 -05:00
comfyanonymous b0aab1e4ea Add an option --fp16-unet to force using fp16 for the unet. 2023-12-11 18:36:29 -05:00
comfyanonymous ba07cb748e Use faster manual cast for fp8 in unet. 2023-12-11 18:24:44 -05:00
comfyanonymous 57926635e8 Switch text encoder to manual cast.
Use fp16 text encoder weights for CPU inference to lower memory usage.
2023-12-10 23:00:54 -05:00
comfyanonymous 340177e6e8 Disable non blocking on mps. 2023-12-10 01:30:35 -05:00
comfyanonymous 9ac0b487ac Make --gpu-only put intermediate values in GPU memory instead of cpu. 2023-12-08 02:35:45 -05:00
comfyanonymous 2db86b4676 Slightly faster lora applying. 2023-12-06 05:13:14 -05:00
comfyanonymous ca82ade765 Use .itemsize to get dtype size for fp8. 2023-12-04 11:52:06 -05:00
comfyanonymous 31b0f6f3d8 UNET weights can now be stored in fp8.
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
2023-12-04 11:10:00 -05:00
comfyanonymous 0cf4e86939 Add some command line arguments to store text encoder weights in fp8.
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
2023-11-17 02:56:59 -05:00
comfyanonymous 7339479b10 Disable xformers when it can't load properly. 2023-11-13 12:31:10 -05:00
comfyanonymous dd4ba68b6e Allow different models to estimate memory usage differently. 2023-11-12 04:03:52 -05:00
comfyanonymous 8594c8be4d Empty the cache when torch cache is more than 25% free mem. 2023-10-22 13:58:12 -04:00
comfyanonymous c8013f73e5 Add some Quadro cards to the list of cards with broken fp16. 2023-10-16 16:48:46 -04:00
comfyanonymous fd4c5f07e7 Add a --bf16-unet to test running the unet in bf16. 2023-10-13 14:51:10 -04:00
comfyanonymous 9a55dadb4c Refactor code so model can be a dtype other than fp32 or fp16. 2023-10-13 14:41:17 -04:00
comfyanonymous 88733c997f pytorch_attention_enabled can now return True when xformers is enabled. 2023-10-11 21:30:57 -04:00
comfyanonymous 20d3852aa1 Pull some small changes from the other repo. 2023-10-11 20:38:48 -04:00
Simon Lui eec449ca8e Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively. 2023-09-22 21:11:27 -07:00
comfyanonymous 1cdfb3dba4 Only do the cast on the device if the device supports it. 2023-09-20 17:52:41 -04:00
comfyanonymous 321c5fa295 Enable pytorch attention by default on xpu. 2023-09-17 04:09:19 -04:00
comfyanonymous 0966d3ce82 Don't run text encoders on xpu because there are issues. 2023-09-14 12:16:07 -04:00
comfyanonymous 1938f5c5fe Add a force argument to soft_empty_cache to force a cache empty. 2023-09-04 00:58:18 -04:00
Simon Lui 4a0c4ce4ef Some fixes to generalize CUDA specific functionality to Intel or other GPUs. 2023-09-02 18:22:10 -07:00
comfyanonymous b8c7c770d3 Enable bf16-vae by default on ampere and up. 2023-08-27 23:06:19 -04:00
comfyanonymous a57b0c797b Fix lowvram model merging. 2023-08-26 11:52:07 -04:00
comfyanonymous f72780a7e3 The new smart memory management makes this unnecessary. 2023-08-25 18:02:15 -04:00
comfyanonymous 30eb92c3cb Code cleanups. 2023-08-24 19:39:18 -04:00
comfyanonymous 51dde87e97 Try to free enough vram for control lora inference. 2023-08-24 17:20:54 -04:00