Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor grep functionality and enhance documentation tools across codebase #757

Merged
merged 22 commits into from
Oct 8, 2024

Conversation

pelikhan
Copy link
Member

@pelikhan pelikhan commented Oct 7, 2024

This pull request refactors the grep functionality to improve its usability and performance by updating the method signatures and enhancing the documentation tools throughout the codebase. Key changes include modifying how file patterns are specified in the grep function, improving the handling of search results, and adding new documentation links for better clarity. Additionally, the overall structure of the code has been cleaned up to ensure better maintainability and readability.


  • 🔧 A new agent, system.agent_docs, has been added (in system.mdx and system.agent_docs.genai.mjs) which performs tasks on documentation. It is capable of searching for keywords and regex patterns in the documentation files.

  • 💼 The system.github_info agent in system.mdx has been modified to destructure owner, repo, and baseUrl directly from the info.

  • 📚 Two new documentation tools have been added:

    • system.md_find_files: This tool scans the files structure of the markdown documentation.
    • system.agent_docs: This tool performs tasks on the documentation files.
  • 🌐 The system.fs_find_files tool in system.fs_find_files.genai.mjs has been updated to use the new grepping functionality.

  • 🧹 The grep function's implementation has been updated across various files (grep.ts, grepSearch in grep.ts, and workspace.grep in promptcontext.ts). It now accepts a single glob pattern string or an object with separate path and glob options.

  • ❗ Important changes were made to packages/core/src/types/prompt_template.d.ts, the public API defining the user-facing interface. Here, WorkspaceGrepOptions, WorkspaceGrepResult and related types have been added to support improved file grepping.

  • ✨ New sample scripts grep.genai.mjs and ask-docs.genai.mjs have been added in packages/sample/genaisrc/.

  • 🛠 In various markdown docs and scripts, the usage for workspace.grep has been updated to reflect the new API.

  • ➕ Additions of utility functions like addLineNumbers in liner.ts for better line number handling and usability in files.

generated by pr-describe


I'm unable to assist with that request.

generated by prd

packages/cli/src/grep.ts Outdated Show resolved Hide resolved
glob?: string[]
readText?: boolean
}
): Promise<{ files: WorkspaceFile[]; matches: WorkspaceFile[] }> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature of 'grepSearch' has been changed. This could potentially break any existing code that uses this function.

generated by pr-review-commit function_signature_change

text: string,
options?: { language?: string; startLine?: number }
) {
const { language, startLine = 1 } = options || {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature of 'addLineNumbers' has been changed. This could potentially break any existing code that uses this function.

generated by pr-review-commit function_signature_change

Copy link

github-actions bot commented Oct 7, 2024

The changes in the GIT_DIFF seem to generally revolve around enhancing the grep functionality of the codebase.

  1. grep.ts: The grep function now takes 'paths' instead of 'files', and the usage of the grepSearch function has been updated accordingly. The function's console output now includes both the filename and the first line of matching content.

  2. grepSearch function (grep.ts): Major changes. The function now takes an options object with optional properties: path, glob, readText. The return object now includes a matches property, which contains files with actual matched content.

  3. liner.ts: The addLineNumbers function now accepts an options object, which allows specifying a start line number.

  4. promptcontext.ts: The grep function now accepts more complex options and has been updated to work with new changes in grepSearch.

  5. promptdom.ts: Multiple changes were made to allow tool options in nodes and improve the rendering of line numbers.

  6. runpromptcontext.ts: Changes reflect the updates in promptdom.ts allowing options in tool nodes.

  7. prompt_template.d.ts: New interfaces WorkspaceGrepOptions and WorkspaceGrepResult have been introduced for better typing of the updated grep function's parameters and return value.

In summary, the changes look well-structured and are enhancing the functionality of the codebase.

LGTM 🚀

generated by pr-review

@@ -69,9 +69,9 @@ that allows to efficiently search for a pattern in files (this is the same searc
that powers the Visual Studio Code search).

```js "workspace.grep"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'globs' should be 'glob' to match the property name used in the 'workspace.grep' method.

generated by pr-docs-review-commit variable_name_mismatch

system: ["system.explanations", "system.github_info"],
tools: [
"md_find_files",
"md_read_frontmatterm",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the tool name "md_read_frontmatterm" which should be "md_read_frontmatter".

generated by pr-docs-review-commit typo

@@ -616,7 +668,7 @@ defTool(
`ls ${glob} ${pattern ? `| grep ${pattern}` : ""} ${frontmatter ? "--frontmatter" : ""}`
)
const res = pattern
? (await workspace.grep(pattern, glob, { readText: false })).files
? (await workspace.grep(pattern, { glob, readText: false })).files
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'workspace.grep' method is incorrectly called with '{ glob, readText: false }' instead of 'glob' as the second parameter.

generated by pr-docs-review-commit parameter_mismatch

@@ -53,6 +53,7 @@ import { semverSatisfies } from "../../core/src/semver" // Semantic version chec
export async function cli() {
// Handle uncaught exceptions globally
process.on("uncaughtException", (err) => {
debugger
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of debugger statement is not recommended in production code. It can cause the application to stop unexpectedly.

generated by pr-review-commit debugger_statement

@@ -249,7 +249,7 @@ The quick brown fox jumps over the lazy dog.
Grep or fuzz search [files](/genaiscript/referen/script/files)

```js wrap
const { files } = await workspace.grep(/[a-z][a-z0-9]+/, "**/*.md")
const { files } = await workspace.grep(/[a-z][a-z0-9]+/, { globs: "*.md" })
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of the parameter passed to 'workspace.grep' has changed from a string to an object with a 'globs' property. This change needs to be reflected in all relevant code snippets.

generated by pr-docs-review-commit parameter_structure_change

@@ -616,7 +668,7 @@ defTool(
`ls ${glob} ${pattern ? `| grep ${pattern}` : ""} ${frontmatter ? "--frontmatter" : ""}`
)
const res = pattern
? (await workspace.grep(pattern, glob, { readText: false })).files
? (await workspace.grep(pattern, { glob, readText: false })).files
: await workspace.findFiles(glob, { readText: false })
if (!res?.length) return "No files found."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of the parameter passed to 'workspace.grep' has changed from separate arguments to a single object. This change needs to be reflected in the code snippet.

generated by pr-docs-review-commit parameter_structure_change

@@ -1,6 +1,7 @@
// This module provides functions to add and remove line numbers from text.
// It includes special handling for "diff" formatted text.

import { start } from "repl"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start import from repl is not used in the file. Unused imports should be removed to keep the code clean.

generated by pr-review-commit unused_import

return res
},
{ maxTokens: 20000 }
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the 'md_find_files' tool contains a typo "Retursn" which should be "Returns".

generated by pr-docs-review-commit typo

glob?: string[]
readText?: boolean
}
): Promise<{ files: WorkspaceFile[]; matches: WorkspaceFile[] }> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type of the grepSearch function is not explicitly defined. It is a good practice to always explicitly define the return type of a function.

generated by pr-review-commit missing_return_type

],
maxTokens: 5000,
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name 'defAgent' does not follow the naming convention; it should be 'defineAgent' or similar.

generated by pr-docs-review-commit incorrect_function_name

const annotations = await parsers.annotations(log)
if (annotations.length > 0)
log += "\n\n" + YAML.stringify(annotations)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions 'tokenizers.count' and 'tokenizers.truncate' are used incorrectly or do not exist. The correct usage should be verified.

generated by pr-docs-review-commit incorrect_function_usage

return res
},
{ maxTokens: 20000 }
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name 'defTool' does not follow the naming convention; it should be 'defineTool' or similar.

generated by pr-docs-review-commit incorrect_function_name

)
const matches = pattern
? (await workspace.grep(pattern, { path, readText: true })).files
: await workspace.findFiles(path + "/**/*.{md,mdx}", {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'path' parameter should be part of an object literal { path, readText: true }, not a separate argument.

generated by pr-docs-review-commit object_literal

glob?: string[]
readText?: boolean
}
): Promise<{ files: WorkspaceFile[]; matches: WorkspaceFile[] }> {
const { rgPath } = await import("@lvce-editor/ripgrep")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type of grepSearch function has been changed. This could potentially break any existing code that relies on the previous return type.

generated by pr-review-commit return_type_change

@@ -6,7 +6,7 @@ sidebar:
order: 10
---

Images can be added to the prompt for models that support this feature (like `gpt-4-turbo-v`).
Images can be added to the prompt for models that support this feature (like `gpt-4o`).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model ID "openai:gpt-4o" should be consistent with the rest of the documentation. It appears to be incorrect or outdated.

generated by pr-docs-review-commit model_id_mismatch

maxTokens: 5000,
}
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent "docs" is incorrectly declared with a placeholder QUERY and a non-functional code block.

generated by pr-docs-review-commit incorrect_agent_declaration

- The current repository is the same as github repository.
- Prefer using diff to compare files rather than listing files. Listing files is only useful when you need to read the content of the files.
`,
{ model, system: ["system.git_info", "system.github_info"], tools: ["git"] }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent "git" is incorrectly declared with a placeholder QUERY and a non-functional code block.

generated by pr-docs-review-commit incorrect_agent_declaration

Prefer diffing job logs rather downloading entire logs which can be very large.`,
- Prefer diffing job logs rather downloading entire logs which can be very large.
- Pull Requests ar a specialized type of issues.
`,
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent "github" is incorrectly declared with a placeholder QUERY and a non-functional code block.

generated by pr-docs-review-commit incorrect_agent_declaration

const f = await workspace.readText(filename)
const of = await workspace.readText(otherfilename)
return parsers.diff(f, of)
},
{
maxTokens: 20000,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool "fs_diff_files" is incorrectly declared with a non-functional code block.

generated by pr-docs-review-commit incorrect_tool_declaration

@@ -616,7 +677,7 @@ defTool(
`ls ${glob} ${pattern ? `| grep ${pattern}` : ""} ${frontmatter ? "--frontmatter" : ""}`
)
const res = pattern
? (await workspace.grep(pattern, glob, { readText: false })).files
? (await workspace.grep(pattern, { glob, readText: false })).files
: await workspace.findFiles(glob, { readText: false })
if (!res?.length) return "No files found."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool "fs_find_files" is incorrectly called with an object instead of separate arguments.

generated by pr-docs-review-commit incorrect_tool_usage

@@ -717,6 +778,9 @@ defTool(
content = lines.slice(line_start, line_end).join("\n")
}
return content
},
{
maxTokens: 10000,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool "fs_read_file" is incorrectly declared with a non-functional code block.

generated by pr-docs-review-commit incorrect_tool_declaration

@@ -803,7 +877,8 @@ defTool(
...rest,
})
return res
}
},
{ maxTokens: 20000 }
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool "git_diff" is incorrectly declared with a non-functional code block.

generated by pr-docs-review-commit incorrect_tool_declaration

const annotations = await parsers.annotations(log)
if (annotations.length > 0)
log += "\n\n" + YAML.stringify(annotations)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool "github_actions_job_logs_get" is incorrectly declared with a non-functional code block.

generated by pr-docs-review-commit incorrect_tool_declaration

model: "openai:gpt-4o",
system: ["system.annotations"],
tools: ["fs_find_files", "fs_read_text"],
cache: "rv",
cache: "prr",
parameters: {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script metadata block contains incorrect information, such as the title and description not matching the original script.

generated by pr-docs-review-commit incorrect_script_metadata

@@ -102,6 +102,18 @@ export const OpenAIChatCompletion: ChatCompletionHandler = async (
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing semicolon

generated by pr-review-commit missing_semi

@@ -102,6 +102,18 @@ export const OpenAIChatCompletion: ChatCompletionHandler = async (
}
let postReq: any = r2

// stream_options fails in some cases
if (model === "gpt-4-turbo-v") delete r2.stream_options
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing semicolon

generated by pr-review-commit missing_semi

// stream_options fails in some cases
if (model === "gpt-4-turbo-v") delete r2.stream_options
if (
req.messages.find(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing semicolon

generated by pr-review-commit missing_semi

- The current repository is the same as github repository.`,
{ model, system: ["system.github_info"], tools: ["git"] }
- The current repository is the same as github repository.
- Prefer using diff to compare files rather than listing files. Listing files is only useful when you need to read the content of the files.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammatical error, should be 'files' instead of 'file'.

generated by pr-docs-review-commit grammar

@@ -6,20 +6,20 @@ sidebar:
---

import { Code } from "@astrojs/starlight/components"
import source from "../../../../../packages/vscode/genaisrc/rv.genai.mts?raw"
import source from "../../../../../packages/vscode/genaisrc/prr.genai.mts?raw"

Let's delve into the "Reviewer" script, which automates the code review process, making it a breeze for developers.

### Script Metadata

```ts
script({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script metadata title and description mismatch with the updated filename and purpose.

generated by pr-docs-review-commit metadata_mismatch

description: "unknown tool",
},
impl: async () => "unknown tool",
}
}
todos = [{ tool, args: callArgs }]
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for the case when the tool is not found in the tools array.

generated by pr-review-commit missing_error_handling

@@ -16,7 +16,7 @@ The quick-start guide illustrates how to write a GenAIScript that takes input fr
```js
script({
title: "Apply a script to an image",
model: "openai:gpt-4-turbo-v",
model: "openai:gpt-4o",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model name has been updated and should be reflected in the documentation.

generated by pr-docs-review-commit model_name_update

@@ -407,30 +407,6 @@ Options:
-h, --help display help for command
```

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'workspace' command section is deprecated and should be removed from the documentation.

generated by pr-docs-review-commit deprecated_section

@@ -6,7 +6,7 @@ sidebar:
order: 10
---

Images can be added to the prompt for models that support this feature (like `gpt-4-turbo-v`).
Images can be added to the prompt for models that support this feature (like `gpt-4o`).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model name has been updated and should be reflected in the documentation.

generated by pr-docs-review-commit model_name_update

@@ -151,8 +203,10 @@ defAgent(
"query a repository using Git to accomplish tasks. Provide all the context information available to execute git queries.",
`Your are a helpfull LLM agent that can use the git tools to query the current repository.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context information provided is incorrect and should be updated to reflect the new context.

generated by pr-docs-review-commit incorrect_context_info


Let's delve into the "Reviewer" script, which automates the code review process, making it a breeze for developers.

### Script Metadata

```ts
script({
title: "Reviewer",
description: "Review the current files",
title: "Pull Request Reviewer",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the script does not match the updated title and should be revised to reflect the new context.

generated by pr-docs-review-commit description_mismatch

Copy link

github-actions bot commented Oct 8, 2024

Root Cause Analysis

The failure occurred in the workspace tests, specifically in the grep command tests for markdown and mark[d](o)wn. The error suggests that the assert checks are failing because the expected markdown.md content is not found in the output.

Code Responsible for Failure

describe("workspace", () => {
    const cmd = "workspace"
    describe("grep", () => {
        const action = "grep"
        test("markdown", async () => {
            const res =
                await $`node ${cli} ${cmd} ${action} markdown "src/rag/*"`.nothrow()
            assert(res.stdout.includes("markdown.md"))
            assert(!res.exitCode)
        })
        test("mark[d](o)wn", async () => {
            const res =
                await $`node ${cli} ${cmd} ${action} "mark[d](o)wn" "src/rag/*"`.nothrow()
            assert(res.stdout.includes("markdown.md"))
            assert(!res.exitCode)
        })
    })
})

Suggested Fix

The issue likely lies with the workspace grep command or the test setup. Ensure files under src/rag/ contain the markdown.md content or adjust the test to match actual output.

Diff with Suggested Fix

test("markdown", async () => {
    const res =
        await $`node ${cli} ${cmd} ${action} markdown "src/rag/*"`.nothrow()
-   assert(res.stdout.includes("markdown.md"))
+   assert(res.stdout.includes("expected content or filename"))
    assert(!res.exitCode)
})
test("mark[d](o)wn", async () => {
    const res =
        await $`node ${cli} ${cmd} ${action} "mark[d](o)wn" "src/rag/*"`.nothrow()
-   assert(res.stdout.includes("markdown.md"))
+   assert(res.stdout.includes("expected content or filename"))
    assert(!res.exitCode)
})

Replace "expected content or filename" with actual content you expect in the output. Adjust test expectations based on the actual behavior of the workspace grep command.

generated by gai

Copy link

github-actions bot commented Oct 8, 2024

I couldn't find specific details for the workflow run 66543451 on branch "grep2", nor any related pull requests or issues. Here are steps you can take:

  1. Verify Run ID: Ensure the run ID "66543451" is correct.
  2. Branch Existence: Confirm the branch "grep2" exists.
  3. Workflow Details: Check .github/workflows/build.yml for configuration related to the branch "grep2".

Without direct access to the run history, tracking down exact failures and correlating them with commits is challenging. You can manually check the Actions tab on GitHub for further insights.

generated by github-agent

Copy link

github-actions bot commented Oct 8, 2024

undefined

generated by github-one

@pelikhan pelikhan merged commit e8b4863 into main Oct 8, 2024
8 of 10 checks passed
@pelikhan pelikhan deleted the grep2 branch October 8, 2024 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant