Simon Lui
f509c6fe21
Fix Intel GPU memory allocation accuracy and documentation update. ( #3459 )
...
* Change calculation of memory total to be more accurate, allocated is actually smaller than reserved.
* Update README.md install documentation for Intel GPUs.
2024-05-12 06:36:30 -04:00
comfyanonymous
fa6dd7e5bb
Fix lowvram issue with saving checkpoints.
...
The previous fix didn't cover the case where the model was loaded in
lowvram mode right before.
2024-05-12 06:13:45 -04:00
comfyanonymous
49c20cdc70
No longer necessary.
2024-05-12 05:34:43 -04:00
comfyanonymous
e1489ad257
Fix issue with lowvram mode breaking model saving.
2024-05-11 21:55:20 -04:00
Simon Lui
a56d02efc7
Change torch.xpu to ipex.optimize, xpu device initialization and remove workaround for text node issue from older IPEX. ( #3388 )
2024-05-02 03:26:50 -04:00
comfyanonymous
258dbc06c3
Fix some memory related issues.
2024-04-14 12:08:58 -04:00
comfyanonymous
0a03009808
Fix issue with controlnet models getting loaded multiple times.
2024-04-06 18:38:39 -04:00
comfyanonymous
5d8898c056
Fix some performance issues with weight loading and unloading.
...
Lower peak memory usage when changing model.
Fix case where model weights would be unloaded and reloaded.
2024-03-28 18:04:42 -04:00
comfyanonymous
c6de09b02e
Optimize memory unload strategy for more optimized performance.
2024-03-24 02:36:30 -04:00
comfyanonymous
4b9005e949
Fix regression with model merging.
2024-03-20 13:56:12 -04:00
comfyanonymous
c18a203a8a
Don't unload model weights for non weight patches.
2024-03-20 02:27:58 -04:00
comfyanonymous
db8b59ecff
Lower memory usage for loras in lowvram mode at the cost of perf.
2024-03-13 20:07:27 -04:00
comfyanonymous
0ed72befe1
Change log levels.
...
Logging level now defaults to info. --verbose sets it to debug.
2024-03-11 13:54:56 -04:00
comfyanonymous
65397ce601
Replace prints with logging and add --verbose argument.
2024-03-10 12:14:23 -04:00
comfyanonymous
dce3555339
Add some tesla pascal GPUs to the fp16 working but slower list.
2024-03-02 17:16:31 -05:00
comfyanonymous
88f300401c
Enable fp16 by default on mps.
2024-02-19 12:00:48 -05:00
comfyanonymous
929e266f3e
Manual cast for bf16 on older GPUs.
2024-02-17 09:01:17 -05:00
comfyanonymous
0b3c50480c
Make --force-fp32 disable loading models in bf16.
2024-02-16 23:01:54 -05:00
comfyanonymous
f83109f09b
Stable Cascade Stage C.
2024-02-16 10:55:08 -05:00
comfyanonymous
aeaeca10bd
Small refactor of is_device_* functions.
2024-02-15 21:10:10 -05:00
comfyanonymous
66e28ef45c
Don't use is_bf16_supported to check for fp16 support.
2024-02-04 20:53:35 -05:00
comfyanonymous
24129d78e6
Speed up SDXL on 16xx series with fp16 weights and manual cast.
2024-02-04 13:23:43 -05:00
comfyanonymous
4b0239066d
Always use fp16 for the text encoders.
2024-02-02 10:02:49 -05:00
comfyanonymous
f9e55d8463
Only auto enable bf16 VAE on nvidia GPUs that actually support it.
2024-01-15 03:10:22 -05:00
comfyanonymous
1b103e0cb2
Add argument to run the VAE on the CPU.
2023-12-30 05:49:07 -05:00
comfyanonymous
e1e322cf69
Load weights that can't be lowvramed to target device.
2023-12-28 21:41:10 -05:00
comfyanonymous
a252963f95
--disable-smart-memory now unloads everything like it did originally.
2023-12-23 04:25:06 -05:00
comfyanonymous
36a7953142
Greatly improve lowvram sampling speed by getting rid of accelerate.
...
Let me know if this breaks anything.
2023-12-22 14:38:45 -05:00
comfyanonymous
2f9d6a97ec
Add --deterministic option to make pytorch use deterministic algorithms.
2023-12-17 16:59:21 -05:00
comfyanonymous
b0aab1e4ea
Add an option --fp16-unet to force using fp16 for the unet.
2023-12-11 18:36:29 -05:00
comfyanonymous
ba07cb748e
Use faster manual cast for fp8 in unet.
2023-12-11 18:24:44 -05:00
comfyanonymous
57926635e8
Switch text encoder to manual cast.
...
Use fp16 text encoder weights for CPU inference to lower memory usage.
2023-12-10 23:00:54 -05:00
comfyanonymous
340177e6e8
Disable non blocking on mps.
2023-12-10 01:30:35 -05:00
comfyanonymous
9ac0b487ac
Make --gpu-only put intermediate values in GPU memory instead of cpu.
2023-12-08 02:35:45 -05:00
comfyanonymous
2db86b4676
Slightly faster lora applying.
2023-12-06 05:13:14 -05:00
comfyanonymous
ca82ade765
Use .itemsize to get dtype size for fp8.
2023-12-04 11:52:06 -05:00
comfyanonymous
31b0f6f3d8
UNET weights can now be stored in fp8.
...
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
2023-12-04 11:10:00 -05:00
comfyanonymous
0cf4e86939
Add some command line arguments to store text encoder weights in fp8.
...
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
2023-11-17 02:56:59 -05:00
comfyanonymous
7339479b10
Disable xformers when it can't load properly.
2023-11-13 12:31:10 -05:00
comfyanonymous
dd4ba68b6e
Allow different models to estimate memory usage differently.
2023-11-12 04:03:52 -05:00
comfyanonymous
8594c8be4d
Empty the cache when torch cache is more than 25% free mem.
2023-10-22 13:58:12 -04:00
comfyanonymous
c8013f73e5
Add some Quadro cards to the list of cards with broken fp16.
2023-10-16 16:48:46 -04:00
comfyanonymous
fd4c5f07e7
Add a --bf16-unet to test running the unet in bf16.
2023-10-13 14:51:10 -04:00
comfyanonymous
9a55dadb4c
Refactor code so model can be a dtype other than fp32 or fp16.
2023-10-13 14:41:17 -04:00
comfyanonymous
88733c997f
pytorch_attention_enabled can now return True when xformers is enabled.
2023-10-11 21:30:57 -04:00
comfyanonymous
20d3852aa1
Pull some small changes from the other repo.
2023-10-11 20:38:48 -04:00
Simon Lui
eec449ca8e
Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively.
2023-09-22 21:11:27 -07:00
comfyanonymous
1cdfb3dba4
Only do the cast on the device if the device supports it.
2023-09-20 17:52:41 -04:00
comfyanonymous
321c5fa295
Enable pytorch attention by default on xpu.
2023-09-17 04:09:19 -04:00
comfyanonymous
0966d3ce82
Don't run text encoders on xpu because there are issues.
2023-09-14 12:16:07 -04:00