Skip to content

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata

License

Notifications You must be signed in to change notification settings

zblurx/MetaFinder

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetaFinder

Search for documents in a domain through Search Engines. The objective is to extract metadata.


Installation:

> pip3 install metafinder

Upgrades are also available using:

> pip3 install metafinder --upgrade

Usage

MetaFinder can be used in 2 ways:

CLI

metafinder -d domain.com -l 20 -o folder [-t 10] -go -bi -ba

Parameters:

  • d: Specifies the target domain.
  • l: Specify the maximum number of results to be searched in the searchs engines.
  • o: Specify the path to save the report.
  • t: Optional. Used to configure the threads (4 by default).
  • v: Show Metafinder version.
  • Search Engines to select (Google by default):
    • go: Optional. Search in Google.
    • bi: Optional. Search in Bing.
    • ba: Optional. Search in Baidu. (Experimental)

In Code

import metafinder.extractor as metadata_extractor

documents_limit = 5
domain = "target_domain"
result = metadata_extractor.extract_metadata_from_google_search(domain, documents_limit)
# result = metadata_extractor.extract_metadata_from_bing_search(domain, documents_limit)
# result = metadata_extractor.extract_metadata_from_baidu_search(domain, documents_limit)
authors = result.get_authors()
software = result.get_software()
for k,v in result.get_metadata().items():
    print(f"{k}:")
    print(f"|_ URL: {v['url']}")
    for metadata,value in v['metadata'].items():
        print(f"|__ {metadata}: {value}")

document_name = "test.pdf"
try:
    metadata_file = metadata_extractor.extract_metadata_from_document(document_name)
    for k,v in metadata_file.items():
        print(f"{k}: {v}")
except FileNotFoundError:
    print("File not found")

Example

image

Author

This project has been developed by:

Contributors

Disclaimer!

The software is designed to leave no trace in the documents we upload to a domain. The author is not responsible for any illegitimate use.

About

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%