Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Error while creating JSON with VLLM using Pixtral or MistralSmall. MistralTokenizer problem. #1400

Open
2 of 3 tasks
balezeauquentin opened this issue Nov 12, 2024 · 0 comments
Labels
triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@balezeauquentin
Copy link

balezeauquentin commented Nov 12, 2024

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

While creating a JSON, I encountered an internal error from LiteLLM. After reviewing my logs, I identified the issue as originating from Vllm. Here's the specific log message:

AttributeError: 'MistralTokenizer' object has no attribute 'eos_token'. Did you mean: 'eos_token_id'?

I know that both Pixtral and MistrallSmall can generate JSON outputs, so the problem might be related to my Vllm or LiteLLM configuration.

Steps to reproduce

Architecture:
Image

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: ${LLM_MODEL}
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 8000
  request_timeout: 600.0
  api_base: ${API_BASE}
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 1000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: ${EMBEDDING_MODEL}
    api_base: ${API_BASE}
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional



chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.mmd$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 1000
  max_input_length: 5000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: true
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

Vllm Error :

2024-11-12T10:17:51.514303464Z INFO:     172.19.0.61:52538 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
2024-11-12T10:17:51.515519674Z ERROR:    Exception in ASGI application
2024-11-12T10:17:51.515533153Z Traceback (most recent call last):
2024-11-12T10:17:51.515539575Z   File "/usr/local/lib/python3.12/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
2024-11-12T10:17:51.515545851Z     result = await app(  # type: ignore[func-returns-value]
2024-11-12T10:17:51.515551006Z              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515556131Z   File "/usr/local/lib/python3.12/dist-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
2024-11-12T10:17:51.515561814Z     return await self.app(scope, receive, send)
2024-11-12T10:17:51.515566844Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515571880Z   File "/usr/local/lib/python3.12/dist-packages/fastapi/applications.py", line 1054, in __call__
2024-11-12T10:17:51.515577501Z     await super().__call__(scope, receive, send)
2024-11-12T10:17:51.515582666Z   File "/usr/local/lib/python3.12/dist-packages/starlette/applications.py", line 113, in __call__
2024-11-12T10:17:51.515588048Z     await self.middleware_stack(scope, receive, send)
2024-11-12T10:17:51.515593126Z   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 187, in __call__
2024-11-12T10:17:51.515598554Z     raise exc
2024-11-12T10:17:51.515603558Z   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 165, in __call__
2024-11-12T10:17:51.515608999Z     await self.app(scope, receive, _send)
2024-11-12T10:17:51.515614064Z   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/cors.py", line 85, in __call__
2024-11-12T10:17:51.515619502Z     await self.app(scope, receive, send)
2024-11-12T10:17:51.515624434Z   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
2024-11-12T10:17:51.515629995Z     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-11-12T10:17:51.515635674Z   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
2024-11-12T10:17:51.515641452Z     raise exc
2024-11-12T10:17:51.515646279Z   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
2024-11-12T10:17:51.515651885Z     await app(scope, receive, sender)
2024-11-12T10:17:51.515656912Z   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 715, in __call__
2024-11-12T10:17:51.515662442Z     await self.middleware_stack(scope, receive, send)
2024-11-12T10:17:51.515689216Z   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 735, in app
2024-11-12T10:17:51.515694897Z     await route.handle(scope, receive, send)
2024-11-12T10:17:51.515700090Z   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 288, in handle
2024-11-12T10:17:51.515706599Z     await self.app(scope, receive, send)
2024-11-12T10:17:51.515711703Z   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 76, in app
2024-11-12T10:17:51.515717007Z     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-11-12T10:17:51.515722179Z   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
2024-11-12T10:17:51.515727623Z     raise exc
2024-11-12T10:17:51.515732321Z   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
2024-11-12T10:17:51.515737923Z     await app(scope, receive, sender)
2024-11-12T10:17:51.515742767Z   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 73, in app
2024-11-12T10:17:51.515748199Z     response = await f(request)
2024-11-12T10:17:51.515753147Z                ^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515757951Z   File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 301, in app
2024-11-12T10:17:51.515763341Z     raw_response = await run_endpoint_function(
2024-11-12T10:17:51.515768395Z                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515773313Z   File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 212, in run_endpoint_function
2024-11-12T10:17:51.515778834Z     return await dependant.call(**values)
2024-11-12T10:17:51.515783706Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515788668Z   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 315, in create_chat_completion
2024-11-12T10:17:51.515794262Z     generator = await chat(raw_request).create_chat_completion(
2024-11-12T10:17:51.515799326Z                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515804400Z   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 268, in create_chat_completion
2024-11-12T10:17:51.515810220Z     return await self.chat_completion_full_generator(
2024-11-12T10:17:51.515815275Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515820521Z   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 624, in chat_completion_full_generator
2024-11-12T10:17:51.515826157Z     async for res in result_generator:
2024-11-12T10:17:51.515831015Z   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 458, in iterate_with_cancellation
2024-11-12T10:17:51.515836463Z     item = await awaits[0]
2024-11-12T10:17:51.515841344Z            ^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515851563Z   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/client.py", line 547, in _process_request
2024-11-12T10:17:51.515857166Z     params = await \
2024-11-12T10:17:51.515862411Z              ^^^^^^^
2024-11-12T10:17:51.515868199Z   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 528, in build_guided_decoding_logits_processor_async
2024-11-12T10:17:51.515874396Z     processor = await get_guided_decoding_logits_processor(
2024-11-12T10:17:51.515879511Z                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515884598Z   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/guided_decoding/__init__.py", line 14, in get_guided_decoding_logits_processor
2024-11-12T10:17:51.515890890Z     return await get_outlines_guided_decoding_logits_processor(
2024-11-12T10:17:51.515896076Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515901427Z   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/guided_decoding/outlines_decoding.py", line 72, in get_outlines_guided_decoding_logits_processor
2024-11-12T10:17:51.515907646Z     return await loop.run_in_executor(global_thread_pool,
2024-11-12T10:17:51.515912688Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515917771Z   File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
2024-11-12T10:17:51.515923039Z     result = self.fn(*self.args, **self.kwargs)
2024-11-12T10:17:51.515927901Z              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515932943Z   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/guided_decoding/outlines_decoding.py", line 131, in _get_logits_processor
2024-11-12T10:17:51.515938547Z     return CFGLogitsProcessor(guide, tokenizer)
2024-11-12T10:17:51.515943396Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515948358Z   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 165, in __init__
2024-11-12T10:17:51.515954053Z     super().__init__(CFGLogitsProcessor._get_guide(cfg, tokenizer))
2024-11-12T10:17:51.515959369Z                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515964563Z   File "/usr/local/lib/python3.12/dist-packages/outlines/caching.py", line 122, in wrapper
2024-11-12T10:17:51.515969923Z     result = cached_function(*args, **kwargs)
2024-11-12T10:17:51.515974792Z              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515979771Z   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 152, in _get_guide
2024-11-12T10:17:51.515985344Z     return CFGGuide(cfg, tokenizer)
2024-11-12T10:17:51.515990418Z            ^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.515995255Z   File "/usr/local/lib/python3.12/dist-packages/outlines/fsm/guide.py", line 272, in __init__
2024-11-12T10:17:51.516005306Z     self.terminal_regexps["$END"] = tokenizer.eos_token
2024-11-12T10:17:51.516011014Z                                     ^^^^^^^^^^^^^^^^^^^
2024-11-12T10:17:51.516017066Z AttributeError: 'MistralTokenizer' object has no attribute 'eos_token'. Did you mean: 'eos_token_id'?

Graphrag Error:

{
    "type": "error",
    "data": "Error Invoking LLM",
    "stack": "Traceback (most recent call last):\n  File \"/home/quentin/.venv/graphrag/lib/python3.12/site-packages/graphrag/llm/base/base_llm.py\", line 54, in _invoke\n    output = await self._execute_llm(input, **kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/home/quentin/.venv/graphrag/lib/python3.12/site-packages/graphrag/llm/openai/openai_chat_llm.py\", line 53, in _execute_llm\n    completion = await self.client.chat.completions.create(\n                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/home/quentin/.venv/graphrag/lib/python3.12/site-packages/openai/resources/chat/completions.py\", line 1412, in create\n    return await self._post(\n           ^^^^^^^^^^^^^^^^^\n  File \"/home/quentin/.venv/graphrag/lib/python3.12/site-packages/openai/_base_client.py\", line 1831, in post\n    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/home/quentin/.venv/graphrag/lib/python3.12/site-packages/openai/_base_client.py\", line 1525, in request\n    return await self._request(\n           ^^^^^^^^^^^^^^^^^^^^\n  File \"/home/quentin/.venv/graphrag/lib/python3.12/site-packages/openai/_base_client.py\", line 1626, in _request\n    raise self._make_status_error_from_response(err.response) from None\nopenai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: OpenAIException - Internal Server Error\\nReceived Model Group=LITELLM2VLLM-pixtral-12b\\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}\n",
    "source": "Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: OpenAIException - Internal Server Error\\nReceived Model Group=LITELLM2VLLM-pixtral-12b\\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}",

Additional Information

@balezeauquentin balezeauquentin added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

1 participant