Skip to content

Commit

Permalink
Strip sitemap entries (langchain-ai#2132)
Browse files Browse the repository at this point in the history
Loading this sitemap didn't work for me
https://www.alzallies.com/sitemap.xml

Changing this fixed it and it seems like a good idea to do it in
general.

Integration tests pass
  • Loading branch information
LeSphax authored Mar 29, 2023
1 parent 27f8078 commit 4ab66c4
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion langchain/document_loaders/sitemap.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def load(self) -> List[Document]:

els = self.parse_sitemap(soup)

results = self.scrape_all([el["loc"] for el in els if "loc" in el])
results = self.scrape_all([el["loc"].strip() for el in els if "loc" in el])

return [
Document(
Expand Down

0 comments on commit 4ab66c4

Please sign in to comment.