comfyanonymous
|
2f9d6a97ec
|
Add --deterministic option to make pytorch use deterministic algorithms.
|
2023-12-17 16:59:21 -05:00 |
comfyanonymous
|
b0aab1e4ea
|
Add an option --fp16-unet to force using fp16 for the unet.
|
2023-12-11 18:36:29 -05:00 |
comfyanonymous
|
ba07cb748e
|
Use faster manual cast for fp8 in unet.
|
2023-12-11 18:24:44 -05:00 |
comfyanonymous
|
57926635e8
|
Switch text encoder to manual cast.
Use fp16 text encoder weights for CPU inference to lower memory usage.
|
2023-12-10 23:00:54 -05:00 |
comfyanonymous
|
340177e6e8
|
Disable non blocking on mps.
|
2023-12-10 01:30:35 -05:00 |
comfyanonymous
|
9ac0b487ac
|
Make --gpu-only put intermediate values in GPU memory instead of cpu.
|
2023-12-08 02:35:45 -05:00 |
comfyanonymous
|
2db86b4676
|
Slightly faster lora applying.
|
2023-12-06 05:13:14 -05:00 |
comfyanonymous
|
ca82ade765
|
Use .itemsize to get dtype size for fp8.
|
2023-12-04 11:52:06 -05:00 |
comfyanonymous
|
31b0f6f3d8
|
UNET weights can now be stored in fp8.
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
|
2023-12-04 11:10:00 -05:00 |
comfyanonymous
|
0cf4e86939
|
Add some command line arguments to store text encoder weights in fp8.
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
|
2023-11-17 02:56:59 -05:00 |
comfyanonymous
|
7339479b10
|
Disable xformers when it can't load properly.
|
2023-11-13 12:31:10 -05:00 |
comfyanonymous
|
dd4ba68b6e
|
Allow different models to estimate memory usage differently.
|
2023-11-12 04:03:52 -05:00 |
comfyanonymous
|
8594c8be4d
|
Empty the cache when torch cache is more than 25% free mem.
|
2023-10-22 13:58:12 -04:00 |
comfyanonymous
|
c8013f73e5
|
Add some Quadro cards to the list of cards with broken fp16.
|
2023-10-16 16:48:46 -04:00 |
comfyanonymous
|
fd4c5f07e7
|
Add a --bf16-unet to test running the unet in bf16.
|
2023-10-13 14:51:10 -04:00 |
comfyanonymous
|
9a55dadb4c
|
Refactor code so model can be a dtype other than fp32 or fp16.
|
2023-10-13 14:41:17 -04:00 |
comfyanonymous
|
88733c997f
|
pytorch_attention_enabled can now return True when xformers is enabled.
|
2023-10-11 21:30:57 -04:00 |
comfyanonymous
|
20d3852aa1
|
Pull some small changes from the other repo.
|
2023-10-11 20:38:48 -04:00 |
Simon Lui
|
eec449ca8e
|
Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively.
|
2023-09-22 21:11:27 -07:00 |
comfyanonymous
|
1cdfb3dba4
|
Only do the cast on the device if the device supports it.
|
2023-09-20 17:52:41 -04:00 |
comfyanonymous
|
321c5fa295
|
Enable pytorch attention by default on xpu.
|
2023-09-17 04:09:19 -04:00 |
comfyanonymous
|
0966d3ce82
|
Don't run text encoders on xpu because there are issues.
|
2023-09-14 12:16:07 -04:00 |
comfyanonymous
|
1938f5c5fe
|
Add a force argument to soft_empty_cache to force a cache empty.
|
2023-09-04 00:58:18 -04:00 |
Simon Lui
|
4a0c4ce4ef
|
Some fixes to generalize CUDA specific functionality to Intel or other GPUs.
|
2023-09-02 18:22:10 -07:00 |
comfyanonymous
|
b8c7c770d3
|
Enable bf16-vae by default on ampere and up.
|
2023-08-27 23:06:19 -04:00 |
comfyanonymous
|
a57b0c797b
|
Fix lowvram model merging.
|
2023-08-26 11:52:07 -04:00 |
comfyanonymous
|
f72780a7e3
|
The new smart memory management makes this unnecessary.
|
2023-08-25 18:02:15 -04:00 |
comfyanonymous
|
30eb92c3cb
|
Code cleanups.
|
2023-08-24 19:39:18 -04:00 |
comfyanonymous
|
51dde87e97
|
Try to free enough vram for control lora inference.
|
2023-08-24 17:20:54 -04:00 |
comfyanonymous
|
cc44ade79e
|
Always shift text encoder to GPU when the device supports fp16.
|
2023-08-23 21:45:00 -04:00 |
comfyanonymous
|
a6ef08a46a
|
Even with forced fp16 the cpu device should never use it.
|
2023-08-23 21:38:28 -04:00 |
comfyanonymous
|
f081017c1a
|
Save memory by storing text encoder weights in fp16 in most situations.
Do inference in fp32 to make sure quality stays the exact same.
|
2023-08-23 01:08:51 -04:00 |
comfyanonymous
|
0d7b0a4dc7
|
Small cleanups.
|
2023-08-20 14:56:47 -04:00 |
Simon Lui
|
9225465975
|
Further tuning and fix mem_free_total.
|
2023-08-20 14:19:53 -04:00 |
Simon Lui
|
2c096e4260
|
Add ipex optimize and other enhancements for Intel GPUs based on recent memory changes.
|
2023-08-20 14:19:51 -04:00 |
comfyanonymous
|
e9469e732d
|
--disable-smart-memory now disables loading model directly to vram.
|
2023-08-20 04:00:53 -04:00 |
comfyanonymous
|
3aee33b54e
|
Add --disable-smart-memory for those that want the old behaviour.
|
2023-08-17 03:12:37 -04:00 |
comfyanonymous
|
2be2742711
|
Fix issue with regular torch version.
|
2023-08-17 01:58:54 -04:00 |
comfyanonymous
|
89a0767abf
|
Smarter memory management.
Try to keep models on the vram when possible.
Better lowvram mode for controlnets.
|
2023-08-17 01:06:34 -04:00 |
comfyanonymous
|
1ce0d8ad68
|
Add CMP 30HX card to the nvidia_16_series list.
|
2023-08-04 12:08:45 -04:00 |
comfyanonymous
|
4a77fcd6ab
|
Only shift text encoder to vram when CPU cores are under 8.
|
2023-07-31 00:08:54 -04:00 |
comfyanonymous
|
3cd31d0e24
|
Lower CPU thread check for running the text encoder on the CPU vs GPU.
|
2023-07-30 17:18:24 -04:00 |
comfyanonymous
|
22f29d66ca
|
Try to fix memory issue with lora.
|
2023-07-22 21:38:56 -04:00 |
comfyanonymous
|
4760c29380
|
Merge branch 'fix-AttributeError-module-'torch'-has-no-attribute-'mps'' of https://github.com/KarryCharon/ComfyUI
|
2023-07-20 00:34:54 -04:00 |
comfyanonymous
|
18885f803a
|
Add MX450 and MX550 to list of cards with broken fp16.
|
2023-07-19 03:08:30 -04:00 |
comfyanonymous
|
ff6b047a74
|
Fix device print on old torch version.
|
2023-07-17 15:18:58 -04:00 |
comfyanonymous
|
1679abd86d
|
Add a command line argument to enable backend:cudaMallocAsync
|
2023-07-17 11:00:14 -04:00 |
comfyanonymous
|
5f57362613
|
Lower lora ram usage when in normal vram mode.
|
2023-07-16 02:59:04 -04:00 |
comfyanonymous
|
490771b7f4
|
Speed up lora loading a bit.
|
2023-07-15 13:25:22 -04:00 |
KarryCharon
|
3e2309f149
|
fix mps miss import
|
2023-07-12 10:06:34 +08:00 |