Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❌ Errors occurred during the pipeline run, see logs for more details (Part 2) #25

Open
mscarroll opened this issue Jul 20, 2024 · 9 comments

Comments

@mscarroll
Copy link

This is the error that I'm getting (Windows, PowerShell; Ollama installed)

16 6091f6e9e75fb0c08b45612806cf11e6 OLO (You Only Look Once) and Faster R-CNN leve... ... 300
17 6da66fe5d9df2b209d8e8cb274389bea can be a limiting factor in domains where dat... ... 300
18 31170fdcb9137905634fbe1f6f7312cd s with other neural network types, such as rec... ... 126

[19 rows x 5 columns]
🚀 create_base_extracted_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
❌ create_base_entity_graph
None
⠹ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 4 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details

LOG FILE:
File "C:\XXX\gen2Env\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}

@yurochang
Copy link

same problem

@Computational-social-science

similar problem

@myyourgit
Copy link

same problem met.

@yakeworld
Copy link

similar problem

@haiyangheart
Copy link

see eust-w/graphrag@ad49d9a

@mscarroll
Copy link
Author

Update:
I couldn't get @haiyangheart 's patch to work (no doubt my own misunderstanding), but I went back and looked at other issues in this thread and noticed that issue 13 is actually the same as mine (#13). I tried the fix suggested by @severian42 (manually copying the configuration yaml file).

This allowed me to get past the current error

Sadly after a couple hours of running, I ran into a new one.

❌ create_final_community_reports
The last line in the log file is:
\Lib\site-packages\pandas\core\indexes\range.py", line 417, in get_loc\n raise KeyError(key)\nKeyError: 'community'\n", "source": "'community'", "details": null}

Thanks

@alexgoller
Copy link

Same error here.

@fivehaitao
Copy link

fivehaitao commented Sep 9, 2024

Update: I couldn't get @haiyangheart 's patch to work (no doubt my own misunderstanding), but I went back and looked at other issues in this thread and noticed that issue 13 is actually the same as mine (#13). I tried the fix suggested by @severian42 (manually copying the configuration yaml file).

This allowed me to get past the current error

Sadly after a couple hours of running, I ran into a new one.

❌ create_final_community_reports The last line in the log file is: \Lib\site-packages\pandas\core\indexes\range.py", line 417, in get_loc\n raise KeyError(key)\nKeyError: 'community'\n", "source": "'community'", "details": null}

Thanks

I am so happy to say that it is work. If you are using llama3. I hope copy the yaml file to your config works too.

Here is the links.
https://github.com/TheAiSingularity/graphrag-local-ollama/issues/25#issuecomment-2337039447

@fivehaitao
Copy link

fivehaitao commented Sep 9, 2024

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://192.168.0.239:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 3
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://192.168.0.239:11434/api/embeddings
    api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

chunks:
  size: 300
  overlap: 64
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 1000
  max_input_length: 4000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: yes
  raw_entities: yes
  top_level_nodes: yes

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants