-
Notifications
You must be signed in to change notification settings - Fork 274
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ch02: update llm retriever scorer with cohere v2
- Loading branch information
Showing
2 changed files
with
18 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,14 @@ | ||
[ | ||
{ | ||
"role": "SYSTEM", | ||
"message": "You are a powerful auditor. Your goal is to score documents based on their relevance to a given question.\n\nThe agent model you are auditing is the following:\n- Agent description: A customer support chatbot for Weights & Biases to answer questions about the Weights & Biases platform, wandb SDK, its integrations, and the weave library.\n\nThe user will provide the context, consisting of multiple documents wrapped in document id tags, for example - <doc_1>, <doc_2>, etc\n\nFirst, score each document on an integer scale of 0 to 2 with the following meanings:\n 0 = represents that the document is irrelevant to the question and cannot be used to answer the question.\n 1 = represents that the document is somewhat relevant to the question and contains some information that could be used to answer the question\n 2 = represents that the document is highly relevant to the question and must be used to answer the question.\n\nScoring Instructions: \nAssign category 2 if the document is entirely related to the question and contains significant facts that can be used to answer the question.\nIf neither of these criteria satisfies the question, give it category 0.\n\nSplit this problem into steps:\nConsider the underlying intent of the question. Measure how well the content matches the likely intent of the question(M).\nMeasure the document's trustworthiness(T) concerning the facts to answer the question.\nConsider the aspects above and their relative importance, then decide on a final score (O).\nFinal scores must be an integer value only.\nDo not provide any code in the result. Provide each score in the following JSON format: \n{\"final_scores\":[{\"id\": <doc_#>, \"relevance\":<integer_score >}, ...}]}" | ||
"role": "system", | ||
"content": "You are a powerful auditor. Your goal is to score documents based on their relevance to a given question.\n\nThe agent model you are auditing is the following:\n- Agent description: A customer support chatbot for Weights & Biases to answer questions about the Weights & Biases platform, wandb SDK, its integrations, and the weave library.\n\nThe user will provide the context, consisting of multiple documents wrapped in document id tags, for example - <doc_1>, <doc_2>, etc\n\nFirst, score each document on an integer scale of 0 to 2 with the following meanings:\n 0 = represents that the document is irrelevant to the question and cannot be used to answer the question.\n 1 = represents that the document is somewhat relevant to the question and contains some information that could be used to answer the question\n 2 = represents that the document is highly relevant to the question and must be used to answer the question.\n\nScoring Instructions: \nAssign category 2 if the document is entirely related to the question and contains significant facts that can be used to answer the question.\nIf neither of these criteria satisfies the question, give it category 0.\n\nSplit this problem into steps:\nConsider the underlying intent of the question. Measure how well the content matches the likely intent of the question(M).\nMeasure the document's trustworthiness(T) concerning the facts to answer the question.\nConsider the aspects above and their relative importance, then decide on a final score (O).\nFinal scores must be an integer value only.\nDo not provide any code in the result. Provide each score in the following JSON format: \n{\"final_scores\":[{\"id\": <doc_#>, \"relevance\":<integer_score >}, ...}]}" | ||
}, | ||
{ | ||
"role": "USER", | ||
"message": "<question>\nHow do I programmatically access the human-readable run name?\n</question>\n<doc_0>\nIf you do not explicitly name your run, a random run name will be assigned to the run to help identify the run in the UI. For instance, random run names will look like \"pleasant-flower-4\" or \"misunderstood-glade-2\".\n\nIf you'd like to overwrite the run name (like snowy-owl-10) with the run ID (like qvlp96vk) you can use this snippet:\n\nimport wandb\n\nwandb.init()\nwandb.run.name = wandb.run.id\nwandb.run.save()\n</doc_0>\n</doc_1>\nA single unit of computation logged by W&B is called a run. You can think of a W&B run as an atomic element of your whole project. You should initiate a new run when you:\n - Train a model\n - Change a hyperparameter\n - Use a different model\n - Log data or a model as a W&B Artifact\n - Download a W&B Artifact\n\nFor example, during a sweep, W&B explores a hyperparameter search space that you specify. Each new hyperparameter combination created by the sweep is implemented and recorded as a unique run. \n</doc_1>\n<doc_2>\nThe run name is available in the `.name` attribute of a `wandb.Run`.\nimport wandb\nwandb.init()\nrun_name = wandb.run.name\n</doc_2>\n<doc_3>\nAfter calling `wandb.init()` you can access the random run ID or the human readable run name from your script like this:\n\nUnique run ID (8 character hash): `wandb.run.id`\nRandom run name (human readable): `wandb.run.name`\nIf you're thinking about ways to set useful identifiers for your runs, here's what we recommend:\n\nRun ID: leave it as the generated hash. This needs to be unique across runs in your project.\nRun name: This should be something short, readable, and preferably unique so that you can tell the difference between different lines on your charts.\n</doc_3>\n<doc_4>\nrun-name\nReturns the name of the run\nArgument\n`run`\nA run\nReturn Value\nThe name of the run\nrun-runtime\nReturns the runtime in seconds of the run\nArgument\n`run`\nA run\nReturn Value\nThe runtime in seconds of the run\nrun-summary\nReturns the summary typedDict of the run\n</doc_4>" | ||
"role": "user", | ||
"content": "<question>\nHow do I programmatically access the human-readable run name?\n</question>\n<doc_0>\nIf you do not explicitly name your run, a random run name will be assigned to the run to help identify the run in the UI. For instance, random run names will look like \"pleasant-flower-4\" or \"misunderstood-glade-2\".\n\nIf you'd like to overwrite the run name (like snowy-owl-10) with the run ID (like qvlp96vk) you can use this snippet:\n\nimport wandb\n\nwandb.init()\nwandb.run.name = wandb.run.id\nwandb.run.save()\n</doc_0>\n</doc_1>\nA single unit of computation logged by W&B is called a run. You can think of a W&B run as an atomic element of your whole project. You should initiate a new run when you:\n - Train a model\n - Change a hyperparameter\n - Use a different model\n - Log data or a model as a W&B Artifact\n - Download a W&B Artifact\n\nFor example, during a sweep, W&B explores a hyperparameter search space that you specify. Each new hyperparameter combination created by the sweep is implemented and recorded as a unique run. \n</doc_1>\n<doc_2>\nThe run name is available in the `.name` attribute of a `wandb.Run`.\nimport wandb\nwandb.init()\nrun_name = wandb.run.name\n</doc_2>\n<doc_3>\nAfter calling `wandb.init()` you can access the random run ID or the human readable run name from your script like this:\n\nUnique run ID (8 character hash): `wandb.run.id`\nRandom run name (human readable): `wandb.run.name`\nIf you're thinking about ways to set useful identifiers for your runs, here's what we recommend:\n\nRun ID: leave it as the generated hash. This needs to be unique across runs in your project.\nRun name: This should be something short, readable, and preferably unique so that you can tell the difference between different lines on your charts.\n</doc_3>\n<doc_4>\nrun-name\nReturns the name of the run\nArgument\n`run`\nA run\nReturn Value\nThe name of the run\nrun-runtime\nReturns the runtime in seconds of the run\nArgument\n`run`\nA run\nReturn Value\nThe runtime in seconds of the run\nrun-summary\nReturns the summary typedDict of the run\n</doc_4>" | ||
}, | ||
{ | ||
"role": "CHATBOT", | ||
"message": "{\"final_scores\":[{\"id\": 0, \"relevance\":2}, {\"id\": 1, \"relevance\":0}, {\"id\": 2, \"relevance\":2}, {\"id\": 3, \"relevance\":2}, {\"id: 4, \"relevance\":1}]}" | ||
"role": "assistant", | ||
"content": "{\"final_scores\":[{\"id\": 0, \"relevance\":2}, {\"id\": 1, \"relevance\":0}, {\"id\": 2, \"relevance\":2}, {\"id\": 3, \"relevance\":2}, {\"id: 4, \"relevance\":1}]}" | ||
} | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters