Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SGLang integration #1460

Open
rlouf opened this issue Feb 28, 2025 · 1 comment
Open

Add SGLang integration #1460

rlouf opened this issue Feb 28, 2025 · 1 comment
Milestone

Comments

@rlouf
Copy link
Member

rlouf commented Feb 28, 2025

We should look into integrating SGlang as an inference library in Outlines.

@rlouf rlouf added this to the 1.0 milestone Feb 28, 2025
@rlouf
Copy link
Member Author

rlouf commented Mar 5, 2025

I would like to try a new interface for servers, which should make it easier to use local servers such as vLLM, SgLang for small workloads:

class SgLang:
    """Simple SgLang server interface.
    
    Automatically handles server startup/shutdown and provides a
    clean interface for text generation.
    """
    
    def __init__(
        self,
        model_id: str,
        host: str = "127.0.0.1",
        port: int = 8000,
        **kwargs
    ):
        """Initialize a SgLang server
        """
        self.port = port
        self.host = "localhost"
        self._server_process = None
        self._client = None
        self._url = f"http://{self.host}:{self.port}/v1"
        
    def __enter__(self):
        """Start the server when entering the context manager."""
        self.start()
        return self
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        """Stop the server when exiting the context manager."""
        self.stop()
        
    async def __aenter__(self):
        """Async context manager entry point."""
        pass
        
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        """Async context manager exit point."""
       pass
        
    def start(self):
        """Start the vLLM server in a subprocess and create the HTTP client. Poll the /health endpoint until the server up."""
       
        
    def stop(self):
        """Stop the vLLM server."""
        pass
            
    def generate(
        self,
        prompt: Union[str, List[str]],
        **kwargs
    ) -> Union[str, List[str]]:
        """Generate text from a prompt or batch of prompts (synchronous).
        """
        pass
            
    async def generate_async(
        self,
        prompt: Union[str, List[str]],
        **kwargs
    ) -> Union[str, List[str]]:
        """Generate text from a prompt or batch of prompts (asynchronous).
        
       pass

Which can be used in a synchronous and asynchronous way:

import outlines

with outlines.servers.sglang() as model:
    result = model("prompt", dict)

async def main():
    async with outlines.servers.sglang() as model:
    result = await model("prompt", dict)
    return result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant