How to ignore images when converting? #1110

NoNarne · 2025-03-10T02:41:39Z

For images in the document, markitdown will convert them into the following form:
![title](url)
Is there any way to make markitdown ignore these images and not generate any content?

The text was updated successfully, but these errors were encountered:

PhiFever · 2025-03-11T02:11:19Z

I use this simple tool to ignore images

class Html2MarkdownConverter:
    def __init__(self):
        self.converter = MarkItDown()

    def convert(self, html_str: str) -> str:
        return self.converter.convert_stream(
            io.BytesIO(html_str.encode("utf8")), file_extension=".html"
        ).text_content

    def convert_without_images(self, html_str: str) -> str:
        markdown_text = self.convert(html_str)
        # remove images (pattern: ![alt text](URL))
        no_images = re.sub(r"!\[.*?]\(.*?\)", "", markdown_text)

        # remove link (pattern: [text](URL))
        no_links = re.sub(r"\[(.*?)]\(.*?\)", r"\1", no_images)

        # remove continuous \n (>2)
        cleaned_text = re.sub(r"\n{3,}", "\n\n", no_links)

        return cleaned_text

NoNarne · 2025-03-11T03:23:51Z

I use this simple tool to ignore images

class Html2MarkdownConverter:
def init(self):
self.converter = MarkItDown()

def convert(self, html_str: str) -> str:
    return self.converter.convert_stream(
        io.BytesIO(html_str.encode("utf8")), file_extension=".html"
    ).text_content

def convert_without_images(self, html_str: str) -> str:
    markdown_text = self.convert(html_str)
    # remove images (pattern: ![alt text](URL))
    no_images = re.sub(r"!\[.*?]\(.*?\)", "", markdown_text)

    # remove link (pattern: [text](URL))
    no_links = re.sub(r"\[(.*?)]\(.*?\)", r"\1", no_images)

    # remove continuous \n (>2)
    cleaned_text = re.sub(r"\n{3,}", "\n\n", no_links)

    return cleaned_text

Thanks! The "remove link" is useful for me too.

So it seems that markitdown does not support ignoring images and links now? Hope it will be supported soon.

afourney · 2025-03-11T05:17:55Z

Thanks. Some folks are advocating for embedding images via data-uris. Others are advocating for removing images entirely. A third option is to consider them sub-documents, and convert them into text recursively (this would potentially give you image captions and metadata). It's clear that some additional control or configuration is needed.

I want to think some about how to proceed here -- it will likely involve some decisions about how to deal with sub-documents.

In the meantime, I can probably add some conversion options to the python interface to support this -- but it might end up deprecated once there's a better design around sub-documents. What do you think?

NoNarne · 2025-03-11T05:41:04Z

Thanks. Some folks are advocating for embedding images via data-uris. Others are advocating for removing images entirely. A third option is to consider them sub-documents, and convert them into text recursively (this would potentially give you image captions and metadata). It's clear that some additional control or configuration is needed.

I want to think some about how to proceed here -- it will likely involve some decisions about how to deal with sub-documents.

In the meantime, I can probably add some conversion options to the python interface to support this -- but it might end up deprecated once there's a better design around sub-documents. What do you think?

I suggest this to be optional, like:

markitdown path-to-file.pdf -o document.md --ignore-images
md = MarkItDown(ignore_images=True)
or
markitdown path-to-file.pdf -o document.md --save-images path_to_save
md = MarkItDown(save_images=path_to_save)

afourney · 2025-03-11T13:18:08Z

Yes, for sure it will be optional/configurable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to ignore images when converting? #1110

How to ignore images when converting? #1110

NoNarne commented Mar 10, 2025 •

edited

Loading

PhiFever commented Mar 11, 2025

NoNarne commented Mar 11, 2025

afourney commented Mar 11, 2025

NoNarne commented Mar 11, 2025 •

edited

Loading

afourney commented Mar 11, 2025

How to ignore images when converting? #1110

How to ignore images when converting? #1110

Comments

NoNarne commented Mar 10, 2025 • edited Loading

PhiFever commented Mar 11, 2025

NoNarne commented Mar 11, 2025

afourney commented Mar 11, 2025

NoNarne commented Mar 11, 2025 • edited Loading

afourney commented Mar 11, 2025

NoNarne commented Mar 10, 2025 •

edited

Loading

NoNarne commented Mar 11, 2025 •

edited

Loading