❌ Errors occurred during the pipeline run, see logs for more details (Part 2) #25

mscarroll · 2024-07-20T01:12:06Z

This is the error that I'm getting (Windows, PowerShell; Ollama installed)

16 6091f6e9e75fb0c08b45612806cf11e6 OLO (You Only Look Once) and Faster R-CNN leve... ... 300
17 6da66fe5d9df2b209d8e8cb274389bea can be a limiting factor in domains where dat... ... 300
18 31170fdcb9137905634fbe1f6f7312cd s with other neural network types, such as rec... ... 126

[19 rows x 5 columns]
🚀 create_base_extracted_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
❌ create_base_entity_graph
None
⠹ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 4 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details

LOG FILE:
File "C:\XXX\gen2Env\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}

yurochang · 2024-07-20T16:54:05Z

same problem

Computational-social-science · 2024-07-21T03:09:53Z

similar problem

haiyangheart · 2024-07-26T08:51:09Z

see eust-w/graphrag@ad49d9a

mscarroll · 2024-08-01T12:51:48Z

Update:
I couldn't get @haiyangheart 's patch to work (no doubt my own misunderstanding), but I went back and looked at other issues in this thread and noticed that issue 13 is actually the same as mine (#13). I tried the fix suggested by @severian42 (manually copying the configuration yaml file).

This allowed me to get past the current error

Sadly after a couple hours of running, I ran into a new one.

❌ create_final_community_reports
The last line in the log file is:
\Lib\site-packages\pandas\core\indexes\range.py", line 417, in get_loc\n raise KeyError(key)\nKeyError: 'community'\n", "source": "'community'", "details": null}

Thanks

alexgoller · 2024-08-16T08:12:06Z

Same error here.

fivehaitao · 2024-09-09T03:28:00Z

Update: I couldn't get @haiyangheart 's patch to work (no doubt my own misunderstanding), but I went back and looked at other issues in this thread and noticed that issue 13 is actually the same as mine (#13). I tried the fix suggested by @severian42 (manually copying the configuration yaml file).

This allowed me to get past the current error

Sadly after a couple hours of running, I ran into a new one.

❌ create_final_community_reports The last line in the log file is: \Lib\site-packages\pandas\core\indexes\range.py", line 417, in get_loc\n raise KeyError(key)\nKeyError: 'community'\n", "source": "'community'", "details": null}

Thanks

I am so happy to say that it is work. If you are using llama3. I hope copy the yaml file to your config works too.

Here is the links.
https://github.com/TheAiSingularity/graphrag-local-ollama/issues/25#issuecomment-2337039447

fivehaitao · 2024-09-09T03:28:05Z

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://192.168.0.239:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 3
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://192.168.0.239:11434/api/embeddings
    api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

chunks:
  size: 300
  overlap: 64
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 1000
  max_input_length: 4000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: yes
  raw_entities: yes
  top_level_nodes: yes

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❌ Errors occurred during the pipeline run, see logs for more details (Part 2) #25

❌ Errors occurred during the pipeline run, see logs for more details (Part 2) #25

mscarroll commented Jul 20, 2024

yurochang commented Jul 20, 2024

Computational-social-science commented Jul 21, 2024

myyourgit commented Jul 21, 2024

yakeworld commented Jul 22, 2024

haiyangheart commented Jul 26, 2024

mscarroll commented Aug 1, 2024

alexgoller commented Aug 16, 2024

fivehaitao commented Sep 9, 2024 •

edited

Loading

fivehaitao commented Sep 9, 2024 •

edited

Loading

❌ Errors occurred during the pipeline run, see logs for more details (Part 2) #25

❌ Errors occurred during the pipeline run, see logs for more details (Part 2) #25

Comments

mscarroll commented Jul 20, 2024

yurochang commented Jul 20, 2024

Computational-social-science commented Jul 21, 2024

myyourgit commented Jul 21, 2024

yakeworld commented Jul 22, 2024

haiyangheart commented Jul 26, 2024

mscarroll commented Aug 1, 2024

alexgoller commented Aug 16, 2024

fivehaitao commented Sep 9, 2024 • edited Loading

fivehaitao commented Sep 9, 2024 • edited Loading

fivehaitao commented Sep 9, 2024 •

edited

Loading

fivehaitao commented Sep 9, 2024 •

edited

Loading