Commit Graph

162 Commits

Author SHA1 Message Date
comfyanonymous 0ec513d877 Add a --force-channels-last to inference models in channel last mode. 2024-06-15 01:08:12 -04:00
Simon Lui 5eb98f0092
Exempt IPEX from non_blocking previews fixing segmentation faults. (#3708) 2024-06-13 18:51:14 -04:00
comfyanonymous 0e49211a11 Load the SD3 T5xxl model in the same dtype stored in the checkpoint. 2024-06-11 17:03:26 -04:00
comfyanonymous 104fcea0c8 Add function to get the list of currently loaded models. 2024-06-05 23:25:16 -04:00
comfyanonymous b1fd26fe9e pytorch xpu should be flash or mem efficient attention? 2024-06-04 17:44:14 -04:00
comfyanonymous b249862080 Add an annoying print to a function I want to remove. 2024-06-01 12:47:31 -04:00
comfyanonymous bf3e334d46 Disable non_blocking when --deterministic or directml. 2024-05-30 11:07:38 -04:00
comfyanonymous 0920e0e5fe Remove some unused imports. 2024-05-27 19:08:27 -04:00
comfyanonymous 6c23854f54 Fix OSX latent2rgb previews. 2024-05-22 13:56:28 -04:00
comfyanonymous 8508df2569 Work around black image bug on Mac 14.5 by forcing attention upcasting. 2024-05-21 16:56:33 -04:00
comfyanonymous 09e069ae6c Log the pytorch version. 2024-05-20 06:22:29 -04:00
comfyanonymous 19300655dd Don't automatically switch to lowvram mode on GPUs with low memory. 2024-05-17 00:31:32 -04:00
Simon Lui f509c6fe21
Fix Intel GPU memory allocation accuracy and documentation update. (#3459)
* Change calculation of memory total to be more accurate, allocated is actually smaller than reserved.

* Update README.md install documentation for Intel GPUs.
2024-05-12 06:36:30 -04:00
comfyanonymous fa6dd7e5bb Fix lowvram issue with saving checkpoints.
The previous fix didn't cover the case where the model was loaded in
lowvram mode right before.
2024-05-12 06:13:45 -04:00
comfyanonymous 49c20cdc70 No longer necessary. 2024-05-12 05:34:43 -04:00
comfyanonymous e1489ad257 Fix issue with lowvram mode breaking model saving. 2024-05-11 21:55:20 -04:00
Simon Lui a56d02efc7
Change torch.xpu to ipex.optimize, xpu device initialization and remove workaround for text node issue from older IPEX. (#3388) 2024-05-02 03:26:50 -04:00
comfyanonymous 258dbc06c3 Fix some memory related issues. 2024-04-14 12:08:58 -04:00
comfyanonymous 0a03009808 Fix issue with controlnet models getting loaded multiple times. 2024-04-06 18:38:39 -04:00
comfyanonymous 5d8898c056 Fix some performance issues with weight loading and unloading.
Lower peak memory usage when changing model.

Fix case where model weights would be unloaded and reloaded.
2024-03-28 18:04:42 -04:00
comfyanonymous c6de09b02e Optimize memory unload strategy for more optimized performance. 2024-03-24 02:36:30 -04:00
comfyanonymous 4b9005e949 Fix regression with model merging. 2024-03-20 13:56:12 -04:00
comfyanonymous c18a203a8a Don't unload model weights for non weight patches. 2024-03-20 02:27:58 -04:00
comfyanonymous db8b59ecff Lower memory usage for loras in lowvram mode at the cost of perf. 2024-03-13 20:07:27 -04:00
comfyanonymous 0ed72befe1 Change log levels.
Logging level now defaults to info. --verbose sets it to debug.
2024-03-11 13:54:56 -04:00
comfyanonymous 65397ce601 Replace prints with logging and add --verbose argument. 2024-03-10 12:14:23 -04:00
comfyanonymous dce3555339 Add some tesla pascal GPUs to the fp16 working but slower list. 2024-03-02 17:16:31 -05:00
comfyanonymous 88f300401c Enable fp16 by default on mps. 2024-02-19 12:00:48 -05:00
comfyanonymous 929e266f3e Manual cast for bf16 on older GPUs. 2024-02-17 09:01:17 -05:00
comfyanonymous 0b3c50480c Make --force-fp32 disable loading models in bf16. 2024-02-16 23:01:54 -05:00
comfyanonymous f83109f09b Stable Cascade Stage C. 2024-02-16 10:55:08 -05:00
comfyanonymous aeaeca10bd Small refactor of is_device_* functions. 2024-02-15 21:10:10 -05:00
comfyanonymous 66e28ef45c Don't use is_bf16_supported to check for fp16 support. 2024-02-04 20:53:35 -05:00
comfyanonymous 24129d78e6 Speed up SDXL on 16xx series with fp16 weights and manual cast. 2024-02-04 13:23:43 -05:00
comfyanonymous 4b0239066d Always use fp16 for the text encoders. 2024-02-02 10:02:49 -05:00
comfyanonymous f9e55d8463 Only auto enable bf16 VAE on nvidia GPUs that actually support it. 2024-01-15 03:10:22 -05:00
comfyanonymous 1b103e0cb2 Add argument to run the VAE on the CPU. 2023-12-30 05:49:07 -05:00
comfyanonymous e1e322cf69 Load weights that can't be lowvramed to target device. 2023-12-28 21:41:10 -05:00
comfyanonymous a252963f95 --disable-smart-memory now unloads everything like it did originally. 2023-12-23 04:25:06 -05:00
comfyanonymous 36a7953142 Greatly improve lowvram sampling speed by getting rid of accelerate.
Let me know if this breaks anything.
2023-12-22 14:38:45 -05:00
comfyanonymous 2f9d6a97ec Add --deterministic option to make pytorch use deterministic algorithms. 2023-12-17 16:59:21 -05:00
comfyanonymous b0aab1e4ea Add an option --fp16-unet to force using fp16 for the unet. 2023-12-11 18:36:29 -05:00
comfyanonymous ba07cb748e Use faster manual cast for fp8 in unet. 2023-12-11 18:24:44 -05:00
comfyanonymous 57926635e8 Switch text encoder to manual cast.
Use fp16 text encoder weights for CPU inference to lower memory usage.
2023-12-10 23:00:54 -05:00
comfyanonymous 340177e6e8 Disable non blocking on mps. 2023-12-10 01:30:35 -05:00
comfyanonymous 9ac0b487ac Make --gpu-only put intermediate values in GPU memory instead of cpu. 2023-12-08 02:35:45 -05:00
comfyanonymous 2db86b4676 Slightly faster lora applying. 2023-12-06 05:13:14 -05:00
comfyanonymous ca82ade765 Use .itemsize to get dtype size for fp8. 2023-12-04 11:52:06 -05:00
comfyanonymous 31b0f6f3d8 UNET weights can now be stored in fp8.
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
2023-12-04 11:10:00 -05:00
comfyanonymous 0cf4e86939 Add some command line arguments to store text encoder weights in fp8.
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
2023-11-17 02:56:59 -05:00