Commit Graph

214 Commits

Author SHA1 Message Date
comfyanonymous 31b0f6f3d8 UNET weights can now be stored in fp8.
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
2023-12-04 11:10:00 -05:00
comfyanonymous 0cf4e86939 Add some command line arguments to store text encoder weights in fp8.
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
2023-11-17 02:56:59 -05:00
comfyanonymous 7339479b10 Disable xformers when it can't load properly. 2023-11-13 12:31:10 -05:00
comfyanonymous dd4ba68b6e Allow different models to estimate memory usage differently. 2023-11-12 04:03:52 -05:00
comfyanonymous 8594c8be4d Empty the cache when torch cache is more than 25% free mem. 2023-10-22 13:58:12 -04:00
comfyanonymous c8013f73e5 Add some Quadro cards to the list of cards with broken fp16. 2023-10-16 16:48:46 -04:00
comfyanonymous fd4c5f07e7 Add a --bf16-unet to test running the unet in bf16. 2023-10-13 14:51:10 -04:00
comfyanonymous 9a55dadb4c Refactor code so model can be a dtype other than fp32 or fp16. 2023-10-13 14:41:17 -04:00
comfyanonymous 88733c997f pytorch_attention_enabled can now return True when xformers is enabled. 2023-10-11 21:30:57 -04:00
comfyanonymous 20d3852aa1 Pull some small changes from the other repo. 2023-10-11 20:38:48 -04:00
Simon Lui eec449ca8e Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively. 2023-09-22 21:11:27 -07:00
comfyanonymous 1cdfb3dba4 Only do the cast on the device if the device supports it. 2023-09-20 17:52:41 -04:00
comfyanonymous 321c5fa295 Enable pytorch attention by default on xpu. 2023-09-17 04:09:19 -04:00
comfyanonymous 0966d3ce82 Don't run text encoders on xpu because there are issues. 2023-09-14 12:16:07 -04:00
comfyanonymous 1938f5c5fe Add a force argument to soft_empty_cache to force a cache empty. 2023-09-04 00:58:18 -04:00
Simon Lui 4a0c4ce4ef Some fixes to generalize CUDA specific functionality to Intel or other GPUs. 2023-09-02 18:22:10 -07:00
comfyanonymous b8c7c770d3 Enable bf16-vae by default on ampere and up. 2023-08-27 23:06:19 -04:00
comfyanonymous a57b0c797b Fix lowvram model merging. 2023-08-26 11:52:07 -04:00
comfyanonymous f72780a7e3 The new smart memory management makes this unnecessary. 2023-08-25 18:02:15 -04:00
comfyanonymous 30eb92c3cb Code cleanups. 2023-08-24 19:39:18 -04:00
comfyanonymous 51dde87e97 Try to free enough vram for control lora inference. 2023-08-24 17:20:54 -04:00
comfyanonymous cc44ade79e Always shift text encoder to GPU when the device supports fp16. 2023-08-23 21:45:00 -04:00
comfyanonymous a6ef08a46a Even with forced fp16 the cpu device should never use it. 2023-08-23 21:38:28 -04:00
comfyanonymous f081017c1a Save memory by storing text encoder weights in fp16 in most situations.
Do inference in fp32 to make sure quality stays the exact same.
2023-08-23 01:08:51 -04:00
comfyanonymous 0d7b0a4dc7 Small cleanups. 2023-08-20 14:56:47 -04:00
Simon Lui 9225465975 Further tuning and fix mem_free_total. 2023-08-20 14:19:53 -04:00
Simon Lui 2c096e4260 Add ipex optimize and other enhancements for Intel GPUs based on recent memory changes. 2023-08-20 14:19:51 -04:00
comfyanonymous e9469e732d --disable-smart-memory now disables loading model directly to vram. 2023-08-20 04:00:53 -04:00
comfyanonymous 3aee33b54e Add --disable-smart-memory for those that want the old behaviour. 2023-08-17 03:12:37 -04:00
comfyanonymous 2be2742711 Fix issue with regular torch version. 2023-08-17 01:58:54 -04:00
comfyanonymous 89a0767abf Smarter memory management.
Try to keep models on the vram when possible.

Better lowvram mode for controlnets.
2023-08-17 01:06:34 -04:00
comfyanonymous 1ce0d8ad68 Add CMP 30HX card to the nvidia_16_series list. 2023-08-04 12:08:45 -04:00
comfyanonymous 4a77fcd6ab Only shift text encoder to vram when CPU cores are under 8. 2023-07-31 00:08:54 -04:00
comfyanonymous 3cd31d0e24 Lower CPU thread check for running the text encoder on the CPU vs GPU. 2023-07-30 17:18:24 -04:00
comfyanonymous 22f29d66ca Try to fix memory issue with lora. 2023-07-22 21:38:56 -04:00
comfyanonymous 4760c29380 Merge branch 'fix-AttributeError-module-'torch'-has-no-attribute-'mps'' of https://github.com/KarryCharon/ComfyUI 2023-07-20 00:34:54 -04:00
comfyanonymous 18885f803a Add MX450 and MX550 to list of cards with broken fp16. 2023-07-19 03:08:30 -04:00
comfyanonymous ff6b047a74 Fix device print on old torch version. 2023-07-17 15:18:58 -04:00
comfyanonymous 1679abd86d Add a command line argument to enable backend:cudaMallocAsync 2023-07-17 11:00:14 -04:00
comfyanonymous 5f57362613 Lower lora ram usage when in normal vram mode. 2023-07-16 02:59:04 -04:00
comfyanonymous 490771b7f4 Speed up lora loading a bit. 2023-07-15 13:25:22 -04:00
KarryCharon 3e2309f149 fix mps miss import 2023-07-12 10:06:34 +08:00
comfyanonymous 0ae81c03bb Empty cache after model unloading for normal vram and lower. 2023-07-09 09:56:03 -04:00
comfyanonymous e7bee85df8 Add arguments to run the VAE in fp16 or bf16 for testing. 2023-07-06 23:23:46 -04:00
comfyanonymous ddc6f12ad5 Disable autocast in unet for increased speed. 2023-07-05 21:58:29 -04:00
comfyanonymous 8d694cc450 Fix issue with OSX. 2023-07-04 02:09:02 -04:00
comfyanonymous dc9d1f31c8 Improvements for OSX. 2023-07-03 00:08:30 -04:00
comfyanonymous 2c4e0b49b7 Switch to fp16 on some cards when the model is too big. 2023-07-02 10:00:57 -04:00
comfyanonymous 6f3d9f52db Add a --force-fp16 argument to force fp16 for testing. 2023-07-01 22:42:35 -04:00
comfyanonymous 1c1b0e7299 --gpu-only now keeps the VAE on the device. 2023-07-01 15:22:40 -04:00
comfyanonymous 3b6fe51c1d Leave text_encoder on the CPU when it can handle it. 2023-07-01 14:38:51 -04:00
comfyanonymous b6a60fa696 Try to keep text encoders loaded and patched to increase speed.
load_model_gpu() is now used with the text encoder models instead of just
the unet.
2023-07-01 13:28:07 -04:00
comfyanonymous 97ee230682 Make highvram and normalvram shift the text encoders to vram and back.
This is faster on big text encoder models than running it on the CPU.
2023-07-01 12:37:23 -04:00
comfyanonymous 62db11683b Move unet to device right after loading on highvram mode. 2023-06-29 20:43:06 -04:00
comfyanonymous 8248babd44 Use pytorch attention by default on nvidia when xformers isn't present.
Add a new argument --use-quad-cross-attention
2023-06-26 13:03:44 -04:00
comfyanonymous f7edcfd927 Add a --gpu-only argument to keep and run everything on the GPU.
Make the CLIP model work on the GPU.
2023-06-15 15:38:52 -04:00
comfyanonymous fed0a4dd29 Some comments to say what the vram state options mean. 2023-06-04 17:51:04 -04:00
comfyanonymous 0a5fefd621 Cleanups and fixes for model_management.py
Hopefully fix regression on MPS and CPU.
2023-06-03 11:05:37 -04:00
comfyanonymous 67892b5ac5 Refactor and improve model_management code related to free memory. 2023-06-02 15:21:33 -04:00
space-nuko 499641ebf1 More accurate total 2023-06-02 00:14:41 -05:00
space-nuko b5dd15c67a System stats endpoint 2023-06-01 23:26:23 -05:00
comfyanonymous 5c38958e49 Tweak lowvram model memory so it's closer to what it was before. 2023-06-01 04:04:35 -04:00
comfyanonymous 94680732d3 Empty cache on mps. 2023-06-01 03:52:51 -04:00
comfyanonymous eb448dd8e1 Auto load model in lowvram if not enough memory. 2023-05-30 12:36:41 -04:00
comfyanonymous 3a1f47764d Print the torch device that is used on startup. 2023-05-13 17:11:27 -04:00
comfyanonymous 6fc4917634 Make maximum_batch_area take into account python2.0 attention function.
More conservative xformers maximum_batch_area.
2023-05-06 19:58:54 -04:00
comfyanonymous 678f933d38 maximum_batch_area for xformers.
Remove useless code.
2023-05-06 19:28:46 -04:00
comfyanonymous cb1551b819 Lowvram mode for gligen and fix some lowvram issues. 2023-05-05 18:11:41 -04:00
comfyanonymous 6ee11d7bc0 Fix import. 2023-05-05 00:19:35 -04:00
comfyanonymous bae4fb4a9d Fix imports. 2023-05-04 18:10:29 -04:00
comfyanonymous 056e5545ff Don't try to get vram from xpu or cuda when directml is enabled. 2023-04-29 00:28:48 -04:00
comfyanonymous 2ca934f7d4 You can now select the device index with: --directml id
Like this for example: --directml 1
2023-04-28 16:51:35 -04:00
comfyanonymous 3baded9892 Basic torch_directml support. Use --directml to use it. 2023-04-28 14:28:57 -04:00
comfyanonymous 5282f56434 Implement Linear hypernetworks.
Add a HypernetworkLoader node to use hypernetworks.
2023-04-23 12:35:25 -04:00
comfyanonymous 3696d1699a Add support for GLIGEN textbox model. 2023-04-19 11:06:32 -04:00
comfyanonymous deb2b93e79 Move code to empty gpu cache to model_management.py 2023-04-15 11:19:07 -04:00
comfyanonymous 1e1875f674 Print xformers version and warning about 0.0.18 2023-04-09 01:31:47 -04:00
comfyanonymous 64557d6781 Add a --force-fp32 argument to force fp32 for debugging. 2023-04-07 00:27:54 -04:00
comfyanonymous bceccca0e5 Small refactor. 2023-04-06 23:53:54 -04:00
藍+85CD 05eeaa2de5
Merge branch 'master' into ipex 2023-04-07 09:11:30 +08:00
藍+85CD 3e2608e12b Fix auto lowvram detection on CUDA 2023-04-06 15:44:05 +08:00
藍+85CD 7cb924f684 Use separate variables instead of `vram_state` 2023-04-06 14:24:47 +08:00
藍+85CD 84b9c0ac2f Import intel_extension_for_pytorch as ipex 2023-04-06 12:27:22 +08:00
EllangoK e5e587b1c0 seperates out arg parser and imports args 2023-04-05 23:41:23 -04:00
藍+85CD 37713e3b0a Add basic XPU device support
closed #387
2023-04-05 21:22:14 +08:00
comfyanonymous e46b1c3034 Disable xformers in VAE when xformers == 0.0.18 2023-04-04 22:22:02 -04:00
Francesco Yoshi Gobbo f55755f0d2
code cleanup 2023-03-27 06:48:09 +02:00
Francesco Yoshi Gobbo cf0098d539
no lowvram state if cpu only 2023-03-27 04:51:18 +02:00
comfyanonymous 4adcea7228 I don't think controlnets were being handled correctly by MPS. 2023-03-24 14:33:16 -04:00
Yurii Mazurevich fc71e7ea08 Fixed typo 2023-03-24 19:39:55 +02:00
Yurii Mazurevich 4b943d2b60 Removed unnecessary comment 2023-03-24 14:15:30 +02:00
Yurii Mazurevich 89fd5ed574 Added MPS device support 2023-03-24 14:12:56 +02:00
comfyanonymous 3ed4a4e4e6 Try again with vae tiled decoding if regular fails because of OOM. 2023-03-22 14:49:00 -04:00
comfyanonymous 9d0665c8d0 Add laptop quadro cards to fp32 list. 2023-03-21 16:57:35 -04:00
comfyanonymous ee46bef03a Make --cpu have priority over everything else. 2023-03-13 21:30:01 -04:00
comfyanonymous 83f23f82b8 Add pytorch attention support to VAE. 2023-03-13 12:45:54 -04:00
comfyanonymous a256a2abde --disable-xformers should not even try to import xformers. 2023-03-13 11:36:48 -04:00
comfyanonymous 0f3ba7482f Xformers is now properly disabled when --cpu used.
Added --windows-standalone-build option, currently it only opens
makes the code open up comfyui in the browser.
2023-03-12 15:44:16 -04:00
comfyanonymous afff30fc0a Add --cpu to use the cpu for inference. 2023-03-06 10:50:50 -05:00
comfyanonymous ebfcf0a9c9 Fix issue. 2023-03-03 13:18:01 -05:00
comfyanonymous fed315a76a To be really simple CheckpointLoaderSimple should pick the right type. 2023-03-03 11:07:10 -05:00
comfyanonymous c1f5855ac1 Make some cross attention functions work on the CPU. 2023-03-03 03:27:33 -05:00
comfyanonymous 69cc75fbf8 Add a way to interrupt current processing in the backend. 2023-03-02 14:42:03 -05:00
comfyanonymous 2c5f0ec681 Small adjustment. 2023-02-27 20:04:18 -05:00
comfyanonymous 86721d5158 Enable highvram automatically when vram >> ram 2023-02-27 19:57:39 -05:00
comfyanonymous 2326ff1263 Add: --highvram for when you want models to stay on the vram. 2023-02-17 21:27:02 -05:00
comfyanonymous d66415c021 Low vram mode for controlnets. 2023-02-17 15:48:16 -05:00
comfyanonymous 4efa67fa12 Add ControlNet support. 2023-02-16 10:38:08 -05:00
comfyanonymous 7e1e193f39 Automatically enable lowvram mode if vram is less than 4GB.
Use: --normalvram to disable it.
2023-02-10 00:47:56 -05:00
comfyanonymous 708138c77d Remove print. 2023-02-08 14:51:18 -05:00
comfyanonymous 853e96ada3 Increase it/s by batching together some stuff sent to unet. 2023-02-08 14:24:00 -05:00
comfyanonymous c92633eaa2 Auto calculate amount of memory to use for --lowvram 2023-02-08 11:42:37 -05:00
comfyanonymous 534736b924 Add some low vram modes: --lowvram and --novram 2023-02-08 11:37:10 -05:00
comfyanonymous a84cd0d1ad Don't unload/reload model from CPU uselessly. 2023-02-08 03:40:43 -05:00