Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serious bug] text files that do not support Chinese content #424

Closed
Pandas886 opened this issue Jul 8, 2024 · 9 comments
Closed

[Serious bug] text files that do not support Chinese content #424

Pandas886 opened this issue Jul 8, 2024 · 9 comments
Labels
community_support Issue handled by community members

Comments

@Pandas886
Copy link

Pandas886 commented Jul 8, 2024

I attempted to conduct an RAG test using Qian Zhongshu's "Fortress Besieged" and encountered the following errors.

the pipeline msg:

❌ create_final_community_reports
None
⠋ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports❌ Errors occurred during the pipeline run, see logs for more details.

View logs like below:

10:54:04,391 graphrag.llm.openai.utils ERROR error loading json, json=```json
{
    "title": "�����̼�������������",
    "summary": "�������Է����̼���Ϊ���ģ��漰���м��������ʵ�塣��������Ϊ���峤�����Լ�ͥ���������Ӱ�죬������������ҽҩ���档������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ�������ڵĹ�ϵ���ӣ��漰�����ڲ��Ľ��������þ����Լ����ⲿʵ��Ļ�����",
    "rating": 6.5,
    "rating_explanation": "��������Ӱ������Ϊ�е�ƫ�ϣ���Ҫ��Ϊ�����ڲ��Ľ����;��þ��߿��ܶ����������������Ӧ��",
    "findings": [
        {
            "summary": "�������ڼ����еĺ��ĵ�λ",
            "explanation": "��������Ϊ�����еij������Լ�ͥ��������Զ��Ӱ�졣�����������ӵ��������ж��صļ��⣬���Լ�ͥҽҩ������Ȥ�������Դ����ϱ���IJ��顣�����̵���Ϊ�;����ڼ����о����쵼���ã������ռǺ�����¼��ϸ��¼�������쵼���������Կ���[Data: Entities (282), Relationships (639, 1316, 1323, 1317, 1320, 1325, 1321, 1318, 1324, 1322)]��"
        },
        {
            "summary": "�����������еĽ�ɫ",
            "explanation": "���в����Ǽ����Ա�����ĵص㣬Ҳ�Ǵ�������׺ʹ�������ġ����轥�����й�������ʾ������ְҵ��ݺ������еĹ�ϵ�����л��漰�������Ա�ľ��þ��ߣ��緽�轥�ƻ�ȥ����֧���˵�����ʾ���IJ���������ͥ����״���й�[Data: Entities (122), Relationships (681, 1075, 276, 1073, 1079, 1071, 1063, 1076, 1074, 1077, 1078)]��"
        },
        {
            "summary": "�����ڲ��Ľ�����������ͳ",
            "explanation": "�����̶����ӵ������ж��صļ��⣬���Ϊ����ȡ��Ϊ���ǹ����������ù��������������塣����������ͳ��ӳ�˼���Խ��������ӺͶԴ�ͳ�Ļ������ء������̵���Ϊ�;����ڼ����о����쵼���ã������ռǺ�����¼��ϸ��¼�������쵼���������Կ���[Data: Entities (282), Relationships (1316, 1321, 1318, 1320)]��"
        },
        {
            "summary": "�����Ա���ⲿʵ��Ļ���",
            "explanation": "���轥�����еĹ�ϵ�������ڹ��������������ⲿʵ��Ļ�����������С����ż����������ֻ�����ʾ�˼����Ա�������е��罻��ְҵ���硣������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ[Data: Entities (122), Relationships (681, 276, 1074)]��"
        },
        {
            "summary": "�����Ա�ľ��þ���",
            "explanation": "�������ھ��þ����ϱ��ֳ��������������ѵ��Ϻ��󣬲�ԸΪ���ӹ�Ӷ��ĸ�����־��þ��߷�ӳ�˼������ض������µ���Ӧ�ԺͶ���Դ�ĺ������á�������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ[Data: Entities (282), Relationships (449, 540, 454, 1071, 1063, 1076, 1074, 1077, 1078)]��"
        }
    ]
}
      
Traceback (most recent call last):
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\utils.py", line 93, in try_parse_json_object
    result = json.loads(input)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
10:54:04,391 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\index\graph\extractors\community_reports\community_reports_extractor.py", line 58, in __call__
    await self._llm(
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\json_parsing_llm.py", line 34, in __call__
    result = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_token_replacing_llm.py", line 37, in __call__
    return await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_history_tracking_llm.py", line 33, in __call__
    output = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\caching_llm.py", line 104, in __call__
    result = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 177, in __call__
    result, start = await execute_with_retry()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 159, in execute_with_retry
    async for attempt in retryer:
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\asyncio\__init__.py", line 166, in __anext__
    do = await self.iter(retry_state=self._retry_state)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
    result = await action(retry_state)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\concurrent\futures\_base.py", line 451, in result
    return self.__get_result()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 165, in execute_with_retry
    return await do_attempt(), start
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 147, in do_attempt
    return await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\base_llm.py", line 48, in __call__
    return await self._invoke_json(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 82, in _invoke_json
    result = await generate()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 74, in generate
    await self._native_json(input, **{**kwargs, "name": call_name})
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 108, in _native_json
    json_output = try_parse_json_object(raw_output)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\utils.py", line 93, in try_parse_json_object
    result = json.loads(input)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
10:54:04,394 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
10:54:04,394 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 71
@Pandas886 Pandas886 changed the title text files that do not support Chinese content [Serious bug]text files that do not support Chinese content Jul 8, 2024
@Pandas886 Pandas886 changed the title [Serious bug]text files that do not support Chinese content [Serious bug] text files that do not support Chinese content Jul 8, 2024
@zhouxihong1
Copy link

I also tried using Chinese text, and it generated normally with UTF-8 characters. Entities and relationships were also generated correctly, including the graph. However, the Chinese characters in the process are in Unicode format. I hope this can be optimized to normal characters, as it appears to be a character encoding warning.

@KylinMountain
Copy link
Contributor

yeah, I am able to do Chinese network novel.
You can refer the document in my weixin gongzhonghao .

我用网络小说仙逆做的 可以成功的。

@xxWeiDG
Copy link

xxWeiDG commented Jul 9, 2024

yeah, I am able to do Chinese network novel. You can refer the document in my weixin gongzhonghao .

我用网络小说仙逆做的 可以成功的。

请问一下您知道这个报错怎么解决嘛
image

@sipie800
Copy link

sipie800 commented Jul 9, 2024

same here. It appears randomly.

@Lincolnwill
Copy link

下您知道这个报错怎么解决嘛

查看下你的logs,可能是大模型 Error Invoking LLM 导致ReadTimeout,最终报KeyError: 'community'

@natoverse
Copy link
Collaborator

Consolidating language support issues here: #696

@natoverse natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2024
@natoverse natoverse added the community_support Issue handled by community members label Jul 25, 2024
@1249815869
Copy link

下您知道这个报错怎么解决嘛

查看下你的logs,可能是大模型 Error Invoking LLM 导致ReadTimeout,最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时,请问您是怎么解决的?

@rmd1710714107
Copy link

下您知道这个报错怎么解决嘛

查看下你的logs,可能是大模型 Error Invoking LLM 导致ReadTimeout,最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时,请问您是怎么解决的?

i have the same error, have u resolve it?

@1249815869
Copy link

下您知道这个报错怎么解决嘛

查看下你的logs,可能是大模型 Error Invoking LLM 导致ReadTimeout,最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时,请问您是怎么解决的?

i have the same error, have u resolve it?

我试着调整了setting.yaml文件中的embeddings部分。
先用fastchat启动一个embedding模型的服务,参考代码如下:

# 启动controller 
python -m fastchat.serve.controller --host 127.0.0.1 --port 21001

# 启动model_worker
python -m fastchat.serve.model_worker --device cpu --model-names bge-m3 --model-path D:/Models/embedding/bge-m3 --controller-address http://127.0.0.1:21001 --worker-address http://127.0.0.1:8080 --host 0.0.0.0 --port 8080 

# 启动服务
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 9000

setting.ymal参考配置如下:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community_support Issue handled by community members
Projects
None yet
Development

No branches or pull requests

9 participants