Merge remote-tracking branch 'origin/paper-submission'

multitel-ai · Jul 24, 2024 · f6ecce7 · f6ecce7
2 parents 1b7e61a + 49810ee
commit f6ecce7
Show file tree

Hide file tree

Showing 25 changed files with 652 additions and 472 deletions.
diff --git a/README.md b/README.md
@@ -6,131 +6,92 @@ Models can be trained using a mix of real and generated data. They can also be l
 
 <img src="docs/images/general_pipeline.png" />
 
-Make sure to install the requirements :
+
+## Installation
+
+We recommend using a virtual environment. Install requirements :
 
 ```
 pip install -r requirements.txt
 ```
 
-We recommend using a virtual environment.
-
 ## Datasets
 
-We experimented with PEOPLE from the [COCO](https://cocodataset.org/#home) datasets :
+- [COCO](https://cocodataset.org/#home) PEOPLE dataset :
 
 ```
 ./run.sh coco
 ```
 
+- [Flickr30K Entities](https://bryanplummer.com/Flickr30kEntities/) PEOPLE dataset :
+
+```
+./run.sh flickr30k
+```
+
+
 Data will be downloaded and put in the respective files for images, labels and captions.
 
-## Generate test images
+## Generate images
 
-To generate some images for the moment you can use
+To generate some images, you can use
 
 ```bash
 ./run.sh gen
 ```
 
-See the `conf/config.yaml` file for more details, you can configure your run.
+See the `conf/config.yaml` file for all details and configuration options.
 
 You can also configure directly on the command line :
 
 ```bash
-./run.sh gen model.cn_use=openpose prompt.base="Trump" prompt.modifier="dancing" data_path.generated=mysupertest
+./run.sh gen model.cn_use=openpose prompt.base="Arnold" prompt.modifier="dancing"
 ```
 
-You will find your images in `data/mysupertest/openpose` along with the base image and the feature extracted.
-
-<div><img width="350" src="docs/images/b_1.png"/></div>
-<div><img width="350" src="docs/images/f_1.png"/></div>
+If you use the `controlnet_segmentation` ControlNet, You will find your images in `data/generated/controlnet_segmentation` along with the base image and the feature extracted.
 
-<div><img width="350" src="docs/images/1_1.png"/></div>
-<div><img width="350" src="docs/images/2_1.png"/></div>
+The configuration options work for all scripts available in the framework. For example, you can have different initial data sizes by controlling sample numbers : 
 
-<div><img width="350" src="docs/images/3_1.png"/></div>
-<div><img width="350" src="docs/images/4_1.png"/></div>
-
-
-## Multi run
+```bash
+./run.sh coco ml.train_nb=500
+```
 
 You can also launch multiple runs. Here's an example of a multi-run with 3 different generators :
 
 ```
 ./run.sh gen model.cn_use=frankjoshua_openpose,fusing_openpose,lllyasviel_openpose
 ```
 
-List of available models can be found in `conf/config.yaml`. We have 3 available extractors at the moment (OpenPose, Canny, MediaPipeFace), If you add another control-net model, make sure you add one of the following strings to its name to set the extractor to use :
+List of available models can be found in `conf/config.yaml`. We have 4 available extractors at the moment (Segmentation, OpenPose, Canny, MediaPipeFace), If you add another control-net model, make sure you add one of the following strings to its name to set the extractor to use :
 
 - openpose
 - canny
+- segmentation
 - mediapipe_face
 
 
 ## Test the quality of images with IQA measures
 
-One way of testing the quality of the generated images is to use computational and statistical methods. One good library for it is [IQA-PyTroch](https://github.com/chaofengc/IQA-PyTorch), you
-can go read its [paper](https://arxiv.org/pdf/2208.14818.pdf).
-
-
-Because images are generated there is no reference image to compare to. We will be using with the
-no-reference metrics Note that methods using here are agnostic to the content of the image, no
-subjective or conceptual score is given. Measures generated here only give an idea of how
-*good looking* the images are.
+One way of testing the quality of the generated images is to use computational and statistical methods. One good library for it is [IQA-PyTroch](https://github.com/chaofengc/IQA-PyTorch), you can go read its [paper](https://arxiv.org/pdf/2208.14818.pdf).
 
-Methods used:
-  - [brisque](https://www.sciencedirect.com/science/article/abs/pii/S0730725X17301340)
-  - [cliipiqa](https://arxiv.org/pdf/2207.12396.pdf)
-  - [dbccn](https://arxiv.org/pdf/1907.02665v1.pdf)
-  - [niqe](https://live.ece.utexas.edu/research/quality/nrqa.html)
+There are two approaches to measure image quality
+- full reference: compare against a real pristine image
+- no reference: compute metrics following a learned opinion
 
+You can use metrics in the same way the generation is done:
 
-You can use these measures in the same way the generation is done:
 ```bash
-./run iqa
+# For paper
+./run.sh src/iqa_paper.py
+# Framework general script
+./run.sh iqa
 ```
-It follows the same configuration that the generation part, with the same file `conf/config.yaml`.
 
-**Note**: `iqa` is going to search for a directory following the same naming convention that `gen.py`,
-that is, the directory has the name `<name chosen by the user>_<cn model used to generate>`.
-This are of course in the `config.yaml` file and can be changed statically or dynamically.
+It follows the same configuration of the generation part, with the same file `conf/config.yaml`.
 
 <img src="docs/images/iqa_measure.png" />
 
 
-### iqa paper 
-
-In this file the approach to measure quality will be the extensive library
-IQA-Pytorch: https://github.com/chaofengc/IQA-PyTorch
-Read also the paper: https://arxiv.org/pdf/2208.14818.pdf
-
-There are basically two approaches to measure image quality
-- full reference: compare againts a real pristine image
-- no reference: compute metrics following a learned opinion
-
-Because images are generated there is no reference image to compare to. We
-will be using with the no-reference metrics
-
-Note that methods using here are agnostic to the content of the image, no
-subjective or conceptual score is given.
-Measures generated here only give an idea of how 'good looking' the images
-are.
-
-Methods used:
-- brisque: https://www.sciencedirect.com/science/article/abs/pii/S0730725X17301340
-- dbccn: https://arxiv.org/pdf/1907.02665v1.pdf
-- niqe: https://live.ece.utexas.edu/research/quality/nrqa.html
-
-
-
-dbcnn is good for: blur, contrast distortion, white and pink noise, dithering, over and under exposure
-brisque is good for: distortion, luminance and blur
-ilniqe: distortion, blur and compression distortion
-
-Note that metrics have each different ranges: [0, 1], [0, +inf] and also sometimes less is better
-and sometimes more is better, it would be a mistake to try to rescale or invert them.
-It is better to treat each separately.
-
 There is file created in `data/iqa/<cn_use>_iqa.json` with the following structure:
 
 ```
@@ -179,19 +140,54 @@ wandb login
 Create `train.txt`, `val.txt`, et `test.txt` :
 
 ```
-./run create_dataset
+./run.sh create_dataset
 ```
 
 Launch the training !
 
 ```
-./run train
+./run.sh train
+```
+
+You can both create and launch at the same time to be able to execute multiple training with multiple augmentation percents on your server using hydra :
+
+```
+./run.sh create_n_train.py -m ml.augmentation_percent=0.1 ml.sampling.enable=True ml.sampling.metric=dbcnn,brisque ml.sampling.sample=best ml.epochs=15
 ```
 
 ## Download and test models 
 
+The download folder can be set in the config file. You'll have folders inside for each wandb project. each project folder contains :
+
+- all models for the project
+- summary file with parameters and results (map, precision ..etc)
+- in case of running `test.py` : results.csv file containing the test results with map and other info.
+
+```
+./run.sh download.py ml.wandb.project=your-project ml.wandb.download.download=true ml.wandb.download.list_all=true
+
+./run.sh test.py 
 ```
-python src/download.py ml.wandb.project=your-project ml.wandb.download.download=true ml.wandb.download.list_all=true
 
-python src/test.py
-```
+**Note** Other scripts exist to execute different studies, like the usage of Active learning, which is still excremental, you can check the `src` folder for those scripts (This code is still not fully integrated into the framework, some path or configuration modifications might be necessary for correct execution).
+
+## Runs Results Plots
+
+Here are some plots for some of the many runs and studies that we performed :
+
+### Coco Sampling
+
+<img src="docs/images/COCO_all_samplings_2.png" />
+
+### Flickr Sampling
+
+<img src="docs/images/flickr_all_samplings.png" />
+
+### Loss values for COCO
+
+<img src="docs/images/COCO_loss.png" />
+
+### Random Sampling - Regular Runs
+
+<img src="docs/images/random_sampling.png" />
+
diff --git a/conf/config.yaml b/conf/config.yaml
@@ -12,30 +12,34 @@
 # If not, see <http://www.gnu.org/licenses/>.
 
 data:
+    # data paths and formats configuration :
+    # create data folder, and all other folders will be created automatically
     base: "data"
-    real: "real" # should contains images/, labels/, and captions/ (captions are optional)
+    real: "real" # Will contain images/, labels/, and captions/ (captions are optional)
     generated: "generated"
     datasets: "datasets"
     image_formats: ["jpeg", "jpg"]
 
-# put every parameter related to machine learning here: dataset size,
+# For the future : put every parameter related to machine learning here: dataset size,
 # ratio between train and test, learning rate ...
 ml:
+    # number of samples for training, validation, and test
     val_nb: 300
     test_nb: 300
     train_nb: 250
+
     augmentation_percent: 0.1 # controls all aug percents parameters everywhere
-    augmentation_percent_baseline: 0
-    baseline: True
+    augmentation_percent_baseline: 0 # Ablation study augmentation => For paper only
     epochs: 300
-    sampling:
+    sampling: # IQA Sampling, check end of this yaml file for list of available metrics
+        # Telling the trainer to use IQA metrics already calculated to sample on best images for training
         enable: false
         metric: brisque # brisque (smaller is better), dbcnn (bigger is better), ilniqe (smaller is better)
         sample: best # to take smaller or bigger values is decided depending the metric
 
-    wandb:
-        entity: sdcn-nantes
-        project: sdcn-shit-testing
+    wandb: # SET wabdb parameters for run tracking and model logging
+        entity: your-wandb-username
+        project: sdcn-project
         download:
             list_all: false
             list_finished: true
@@ -46,12 +50,20 @@ ml:
             query_filter: false
 
 prompt:
+    # if use_captions is set to 1 in "model". A vocabulary will be used
+    # to modify the original captions and generate newer captions
+    # to create multiple diverse synthetic images from the same original image
     template: vocabulary
     modify_captions: 1
     generation_size: 10
+
+    # POSITIVE PROMPTS: THIS section is used if use_captions is set to 0 in "model"
+    # This section can be used if your dataset doesn't have captions already included
     base: ["Sandra Oh", "Kim Kardashian", "rihanna ", "taylor swift"]
     quality: "showing emotion, great realistic face, best quality, extremely detailed,"
     modifier: "Happy man smiling"
+
+    # Best negative prompts were chosen for better image quality generation 
     negative:
         [
             "monochrome, lowres, bad anatomy, worst quality, low quality, cartoon, unrealistic, bad proportion,",
@@ -62,18 +74,29 @@ prompt:
             "extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs,",
             "extra arms, extra legs, fused fingers, too many fingers, long neck, no cars, no people, illustration, painting,",
             "drawing, art, sketch, anime, deformation, distorsion",
-        ]
+        ] 
     negative_simple: "monochrome, lowres, bad anatomy, worst quality, low quality, cartoon, unrealistic, bad proportion, disfigured, mutation, deformed mouth, deformed eyes, unnatural teeth, unnatural eyes, unnatural mouth, unnatural face, unnatural face expression, not human"
 
 model:
-    use_captions: 1
-    use_labels: 0
-    augmentation_percent: 0 # not used at the moment anywhere
-    sd: runwayml/stable-diffusion-v1-5
-    cn_use: controlnet_segmentation
+    use_captions: 1 # use captions that come with the dataset
+    # use_labels: 0 # not used for now, will be used in future features
+    sd: runwayml/stable-diffusion-v1-5 # stable diffusion version
+    cn_use: controlnet_segmentation # control net to use from the list
     cn:
+        # YOU should list all the control nets that you would like to use here
+        # add an extractor name to the end in order to define which extractor to use :
+        # - _segmentation
+        # - _canny
+        # - _openpose
+        # - _mediapipe_face
+        # 
+        # All models listed below can be found on hugging face 
+
+        - controlled_false_segmentation: lllyasviel/sd-controlnet-seg # bad extractor, just for paper tests
+
         # Segmentation
         - controlnet_segmentation: lllyasviel/sd-controlnet-seg
+
         # Canny
         - lllyasviel_canny: lllyasviel/sd-controlnet-canny
         - lllyasviel_scribble_canny: lllyasviel/sd-controlnet-scribble
@@ -86,28 +109,23 @@ model:
         # MediaPipeFace
         - crucible_mediapipe_face: CrucibleAI/ControlNetMediaPipeFace
 
-        # Coming Soon !
-        # - depth: lllyasviel/sd-controlnet-depth
-        # - hed: lllyasviel/sd-controlnet-hed
-        # - normal: lllyasviel/sd-controlnet-normal
-        # - scribble: lllyasviel/sd-controlnet-scribble
-        # - segmentation: lllyasviel/sd-controlnet-seg
-        # - mlsd: lllyasviel/sd-controlnet-mlsd
-
-    cn_extra_settings:
+    cn_extra_settings: 
+        # In case your control net class takes in other parameters use this section to define them
         crucible_mediapipe_face:
             subfolder: diffusion_sd15
-    seed: 34567
-    device: cuda
+    seed: 34567 # random seed for SDCN generation
+    device: cuda # cpu or cuda ?
 
+# IQA METRICS SAMPLING :  Calculate all IQA metrics listed for all generated datasets
 iqa:
-    device: cuda
-    metrics: [brisque, dbcnn, nima, ilniqe]
+    device: cuda # cpu or cuda ?
+    metrics: [brisque, dbcnn, nima, ilniqe] # metrics to calculate scores
     # available metrics : brisque, clipiqa+, dbcnn, ilniqe, niqe, nima, cnniqa, nrqm, pi, ilniqe, niqe
     # read more on : https://github.com/chaofengc/IQA-PyTorch/blob/main/docs/ModelCard.md
-    abled: False
 
-active:
-    abled: False
-    rounds: 5
-    sel: 125 # 50 percent
+# ACTIVE LEARNING SAMPLING
+active: 
+    enable: False
+    rounds: 5 # rounds of AL 
+    sel: 125 # number of samples added each round
+    sampling: confidence # confidence, coreset, baseline (for ablation study)
diff --git a/docs/images/1_1.png b/docs/images/1_1.png
diff --git a/docs/images/2_1.png b/docs/images/2_1.png
diff --git a/docs/images/3_1.png b/docs/images/3_1.png
diff --git a/docs/images/4_1.png b/docs/images/4_1.png
diff --git a/docs/images/5_1.png b/docs/images/5_1.png
diff --git a/docs/images/COCO_all_samplings_2.png b/docs/images/COCO_all_samplings_2.png
diff --git a/docs/images/COCO_loss.png b/docs/images/COCO_loss.png
diff --git a/docs/images/b_1.png b/docs/images/b_1.png
diff --git a/docs/images/f_1.png b/docs/images/f_1.png
diff --git a/docs/images/flickr_all_samplings.png b/docs/images/flickr_all_samplings.png
diff --git a/docs/images/iqa_measure.png b/docs/images/iqa_measure.png
diff --git a/docs/images/random_sampling.png b/docs/images/random_sampling.png