[Serious bug] text files that do not support Chinese content #424

Pandas886 · 2024-07-08T07:28:43Z

I attempted to conduct an RAG test using Qian Zhongshu's "Fortress Besieged" and encountered the following errors.

the pipeline msg：

❌ create_final_community_reports
None
⠋ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports❌ Errors occurred during the pipeline run, see logs for more details.

View logs like below：

10:54:04,391 graphrag.llm.openai.utils ERROR error loading json, json=```json
{
    "title": "�����̼�������������",
    "summary": "�������Է����̼���Ϊ���ģ��漰���м��������ʵ�塣��������Ϊ���峤�����Լ�ͥ���������Ӱ�죬������������ҽҩ���档������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ�������ڵĹ�ϵ���ӣ��漰�����ڲ��Ľ��������þ����Լ����ⲿʵ��Ļ�����",
    "rating": 6.5,
    "rating_explanation": "��������Ӱ������Ϊ�е�ƫ�ϣ���Ҫ��Ϊ�����ڲ��Ľ����;��þ��߿��ܶ����������������Ӧ��",
    "findings": [
        {
            "summary": "�������ڼ����еĺ��ĵ�λ",
            "explanation": "��������Ϊ�����еĳ������Լ�ͥ��������Զ��Ӱ�졣�����������ӵ��������ж��صļ��⣬���Լ�ͥҽҩ������Ȥ�������Դ����ϱ���Ĳ��顣�����̵���Ϊ�;����ڼ����о����쵼���ã������ռǺ�����¼��ϸ��¼�������쵼���������Կ���[Data: Entities (282), Relationships (639, 1316, 1323, 1317, 1320, 1325, 1321, 1318, 1324, 1322)]��"
        },
        {
            "summary": "�����������еĽ�ɫ",
            "explanation": "���в����Ǽ����Ա�����ĵص㣬Ҳ�Ǵ�������׺ʹ�������ġ����轥�����й�������ʾ������ְҵ��ݺ������еĹ�ϵ�����л��漰�������Ա�ľ��þ��ߣ��緽�轥�ƻ�ȥ����֧���˵�����ʾ���Ĳ���������ͥ����״���й�[Data: Entities (122), Relationships (681, 1075, 276, 1073, 1079, 1071, 1063, 1076, 1074, 1077, 1078)]��"
        },
        {
            "summary": "�����ڲ��Ľ�����������ͳ",
            "explanation": "�����̶����ӵ������ж��صļ��⣬���Ϊ����ȡ��Ϊ���ǹ����������ù��������������塣����������ͳ��ӳ�˼���Խ��������ӺͶԴ�ͳ�Ļ������ء������̵���Ϊ�;����ڼ����о����쵼���ã������ռǺ�����¼��ϸ��¼�������쵼���������Կ���[Data: Entities (282), Relationships (1316, 1321, 1318, 1320)]��"
        },
        {
            "summary": "�����Ա���ⲿʵ��Ļ���",
            "explanation": "���轥�����еĹ�ϵ�������ڹ��������������ⲿʵ��Ļ�����������С����ż����������ֻ�����ʾ�˼����Ա�������е��罻��ְҵ���硣������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ[Data: Entities (122), Relationships (681, 276, 1074)]��"
        },
        {
            "summary": "�����Ա�ľ��þ���",
            "explanation": "�������ھ��þ����ϱ��ֳ��������������ѵ��Ϻ��󣬲�ԸΪ���ӹ�Ӷ��ĸ�����־��þ��߷�ӳ�˼������ض������µ���Ӧ�ԺͶ���Դ�ĺ������á�������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ[Data: Entities (282), Relationships (449, 540, 454, 1071, 1063, 1076, 1074, 1077, 1078)]��"
        }
    ]
}
      
Traceback (most recent call last):
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\utils.py", line 93, in try_parse_json_object
    result = json.loads(input)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
10:54:04,391 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\index\graph\extractors\community_reports\community_reports_extractor.py", line 58, in __call__
    await self._llm(
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\json_parsing_llm.py", line 34, in __call__
    result = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_token_replacing_llm.py", line 37, in __call__
    return await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_history_tracking_llm.py", line 33, in __call__
    output = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\caching_llm.py", line 104, in __call__
    result = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 177, in __call__
    result, start = await execute_with_retry()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 159, in execute_with_retry
    async for attempt in retryer:
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\asyncio\__init__.py", line 166, in __anext__
    do = await self.iter(retry_state=self._retry_state)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
    result = await action(retry_state)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\concurrent\futures\_base.py", line 451, in result
    return self.__get_result()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 165, in execute_with_retry
    return await do_attempt(), start
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 147, in do_attempt
    return await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\base_llm.py", line 48, in __call__
    return await self._invoke_json(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 82, in _invoke_json
    result = await generate()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 74, in generate
    await self._native_json(input, **{**kwargs, "name": call_name})
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 108, in _native_json
    json_output = try_parse_json_object(raw_output)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\utils.py", line 93, in try_parse_json_object
    result = json.loads(input)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
10:54:04,394 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
10:54:04,394 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 71

The text was updated successfully, but these errors were encountered:

zhouxihong1 · 2024-07-08T09:46:12Z

I also tried using Chinese text, and it generated normally with UTF-8 characters. Entities and relationships were also generated correctly, including the graph. However, the Chinese characters in the process are in Unicode format. I hope this can be optimized to normal characters, as it appears to be a character encoding warning.

KylinMountain · 2024-07-08T23:59:22Z

yeah, I am able to do Chinese network novel.
You can refer the document in my weixin gongzhonghao .

我用网络小说仙逆做的可以成功的。

xxWeiDG · 2024-07-09T03:48:05Z

yeah, I am able to do Chinese network novel. You can refer the document in my weixin gongzhonghao .

我用网络小说仙逆做的可以成功的。

请问一下您知道这个报错怎么解决嘛

sipie800 · 2024-07-09T15:12:55Z

same here. It appears randomly.

Lincolnwill · 2024-07-19T02:17:10Z

下您知道这个报错怎么解决嘛

查看下你的logs，可能是大模型 Error Invoking LLM 导致ReadTimeout，最终报KeyError: 'community'

natoverse · 2024-07-25T00:21:47Z

Consolidating language support issues here: #696

1249815869 · 2024-09-03T06:57:22Z

下您知道这个报错怎么解决嘛

查看下你的logs，可能是大模型 Error Invoking LLM 导致ReadTimeout，最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时，请问您是怎么解决的？

rmd1710714107 · 2024-09-18T08:34:44Z

下您知道这个报错怎么解决嘛

查看下你的logs，可能是大模型 Error Invoking LLM 导致ReadTimeout，最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时，请问您是怎么解决的？

i have the same error, have u resolve it?

1249815869 · 2024-09-20T01:10:00Z

下您知道这个报错怎么解决嘛

查看下你的logs，可能是大模型 Error Invoking LLM 导致ReadTimeout，最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时，请问您是怎么解决的？

i have the same error, have u resolve it?

我试着调整了setting.yaml文件中的embeddings部分。
先用fastchat启动一个embedding模型的服务，参考代码如下：

# 启动controller 
python -m fastchat.serve.controller --host 127.0.0.1 --port 21001

# 启动model_worker
python -m fastchat.serve.model_worker --device cpu --model-names bge-m3 --model-path D:/Models/embedding/bge-m3 --controller-address http://127.0.0.1:21001 --worker-address http://127.0.0.1:8080 --host 0.0.0.0 --port 8080 

# 启动服务
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 9000

setting.ymal参考配置如下：

Pandas886 changed the title ~~text files that do not support Chinese content~~ [Serious bug]text files that do not support Chinese content Jul 8, 2024

Pandas886 changed the title ~~[Serious bug]text files that do not support Chinese content~~ [Serious bug] text files that do not support Chinese content Jul 8, 2024

AlonsoGuevara mentioned this issue Jul 9, 2024

Feat/language in autotemplating #468

Merged

4 tasks

natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2024

natoverse added the community_support Issue handled by community members label Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serious bug] text files that do not support Chinese content #424

[Serious bug] text files that do not support Chinese content #424

Pandas886 commented Jul 8, 2024 •

edited by AlonsoGuevara

Loading

zhouxihong1 commented Jul 8, 2024

KylinMountain commented Jul 8, 2024

xxWeiDG commented Jul 9, 2024

sipie800 commented Jul 9, 2024

Lincolnwill commented Jul 19, 2024

natoverse commented Jul 25, 2024

1249815869 commented Sep 3, 2024

rmd1710714107 commented Sep 18, 2024

1249815869 commented Sep 20, 2024

[Serious bug] text files that do not support Chinese content #424

[Serious bug] text files that do not support Chinese content #424

Comments

Pandas886 commented Jul 8, 2024 • edited by AlonsoGuevara Loading

zhouxihong1 commented Jul 8, 2024

KylinMountain commented Jul 8, 2024

xxWeiDG commented Jul 9, 2024

sipie800 commented Jul 9, 2024

Lincolnwill commented Jul 19, 2024

natoverse commented Jul 25, 2024

1249815869 commented Sep 3, 2024

rmd1710714107 commented Sep 18, 2024

1249815869 commented Sep 20, 2024

Pandas886 commented Jul 8, 2024 •

edited by AlonsoGuevara

Loading