Skip to content

Commit2Cosmos/Text_Summarizer_PyTorch

Repository files navigation

Text_Summarizer_PyTorch

This project aims to build a text summarizer with frontend using PyTorch and Hugging Face.

Instructions to use

  • Pull the image from DockerHub:
    docker pull antonbeloval08/text-summarizer

OR

  • Build the image yourself:

    git clone https://github.com/Commit2Cosmos/Text_Summarizer_PyTorch.git
    docker build --no-cache -t text-summarizer .
  • Create a container:

    docker run --name text-sumarizer-test -p 8000:8000 text-summarizer
  • Visit the webapp: http://0.0.0.0:8000

  • To train: Click on the "Training" section, press "Try it out" and press "Execute"

  • To summarize: Click on the "Inference" section, press "Try it out", enter the text you want summarized and press "Execute"

-------------------------- DEV NOTES --------------------------

TODO

  • Add ability to control parameters in params.json in the web app

  • Change training + evaluation components to work with multiple datasets (dynamic saving file paths etc)

  • Resolve Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file

  • Raise issue of the incorrect warning about that model needs to be trained because of newly initialised encoding layers

Milestones

  • Choose and download the pre-trained model and dataset for transfer learning
  • Build pipelines (listed below)
  • Check for transfer learning or fine-tuning (Untrained layers are already provided, so just train them -> update training pipeline)
  • Model packaging (serialisation, containerisation) -> Docker
  • Choose deployment strategy (cloud or local) and interaction type (API, webapp, cli, embedded systems)
  • Train the model with VertexAI (separate data and model -> don't use the container) and upload trained weights

(Optional):

  • Build + deploy custom webapp
  • Support for multiple datasets
  • Add output text size control feature
  • Add context area for user defined personalisations
  • Add support for pdf (and other) files (multimodality)

Pipelines

  • Data Ingestion
  • Data Validation
  • Data Transformation + Feature Engineering
  • Model Training
  • Model Evaluation
  • Model Inference

Workflow (Files to update)

See architecture file for detailed breakdown of the project's architecture and what each file does.

  • logging.py

  • pyproject.toml

  • params.json

  • config.json

  • src/entity

  • src/config

  • src/components

  • src/pipeline

  • main.py

  • app.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published