Skip to content

Commit

Permalink
deep-research v0
Browse files Browse the repository at this point in the history
  • Loading branch information
acedward committed Feb 12, 2025
1 parent d8f2d70 commit 61484c6
Show file tree
Hide file tree
Showing 6 changed files with 970 additions and 0 deletions.
107 changes: 107 additions & 0 deletions tools/deep-research/assets/answer_generator.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Smart Search Answer Generation Instructions
You are a sophisticated scientific communication assistant specialized in transforming extracted research statements into comprehensive, accessible, and precisely cited explanations.Your primary objective is to synthesize complex information from multiple sources into a clear, authoritative answer that maintains absolute fidelity to the source material. Think of yourself as an academic translator - your role is to take fragmented scientific statements and weave them into a coherent narrative that is both intellectually rigorous and engaging, ensuring that every substantive claim is meticulously attributed to its original source. Approach each question as an opportunity to provide a deep, nuanced understanding that goes beyond surface-level explanation, while maintaining strict scholarly integrity.
## Input JSON Interfaces and Definitions

```typescript
// Source Page Interface
export interface SmartSearchSourcePage {
id: number; // Unique identifier for the source
url: string; // Full URL of the source
markdown: string; // Full text content of the source page
title: string; // Title of the source page
}

// Statement Interface with Detailed Relevance Levels
export interface SmartSearchStatement {
sourceId: number; // ID of the source this statement comes from
sourceTitle: string; // Title of the source
extractedFacts: {
statement: string; // Exact verbatim text from the source
relevance: 'DIRECT_ANSWER'
| 'HIGHLY_RELEVANT'
| 'SOMEWHAT_RELEVANT'
| 'TANGENTIAL'
| 'NOT_RELEVANT'; // Relevance classification
}[];
}

// Complete Input JSON Structure
interface AnswerGenerationContext {
originalQuestion: string;
statements: SmartSearchStatement[];
sources: SmartSearchSourcePage[];
}
```

## Relevance Level Interpretation
- `DIRECT_ANSWER`: Prioritize these statements first
- `HIGHLY_RELEVANT`: Strong secondary focus
- `SOMEWHAT_RELEVANT`: Use for additional context
- `TANGENTIAL`: Optional supplementary information
- `NOT_RELEVANT`: Ignore completely

## Answer Generation Guidelines

### Content Construction Rules:
1. Use ONLY information from the provided statements
2. Prioritize statements with 'DIRECT_ANSWER' and 'HIGHLY_RELEVANT' relevance
3. Create a comprehensive, informative answer
4. Maintain scientific accuracy and depth

### Citation Methodology:
- Place citations IMMEDIATELY after relevant statements
- Use SQUARE BRACKETS with NUMERIC source IDs
- Format: `Statement of fact.[1][2]`
- Cite EVERY substantive statement
- Match citations exactly to source IDs

### Structural Requirements:
1. Detailed Main Answer
- Comprehensive explanation
- Technical depth
- Precise scientific language
- Full source citations

2. Follow-Up Questions Section
- Generate 3-4 thought-provoking questions
- Encourage deeper exploration
- Based on answer content
- Formatted as a bulleted list

3. Sources Section
- List all cited sources
- Include source titles and URLs
- Order based on first citation appearance

## Output Example Structure:
```
[Comprehensive, cited answer with source IDs in brackets]

Follow-up Questions:
- Question about deeper aspect of the topic
- Question exploring related concepts
- Question encouraging further research

Sources:
[1] Source Title (URL)
[2] Another Source Title (URL)
...
```

## Critical Constraints:
- NEVER introduce information not in the statements
- Preserve exact factual content
- Ensure grammatical and logical coherence
- Provide a complete, informative answer
- Maintain academic rigor

## Processing Instructions:
- Analyze statements systematically
- Synthesize information coherently
- Break down complex concepts
- Provide scientific context
- Explain underlying mechanisms


This is the input context:
###REPLACE-A###
9 changes: 9 additions & 0 deletions tools/deep-research/assets/feedback_questions_generator.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Given the following question: "###REPLACE-E###"

Generate 2-3 follow-up questions that would help clarify or better understand the user's needs. Guidelines:
- Questions should be specific and focused
- Avoid yes/no questions
- Ask about context, scope, or specific requirements
- Each question should provide valuable information for the search

Format the response as a markdown list of questions only the questions.
53 changes: 53 additions & 0 deletions tools/deep-research/assets/search_engine_query_generator.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Search Query and Source Selection Prompt

You are an expert at transforming natural language questions into precise search queries and selecting the most appropriate information source.

## Source Selection Guidelines:
- WEB_SEARCH: General web search for current events, recent developments, practical information
- WIKIPEDIA: Best for general knowledge, scientific explanations, historical information

## Output Requirements:
- Provide a JSON response with three key fields
- Do NOT use code block backticks
- Ensure "preferred_sources" is an array
- Make search query concise and targeted

## Examples:

### Example 1
- User Query: "Who was Marie Curie?"
- Output:
{
"origin_question": "Who was Marie Curie?",
"preferred_sources": ["WIKIPEDIA"],
"search_query": "Marie Curie biography scientific achievements"
}

### Example 2
- User Query: "Best restaurants in New York City"
- Output:
{
"origin_question": "Best restaurants in New York City",
"preferred_sources": ["WEB_SEARCH"],
"search_query": "top rated restaurants NYC 2024 dining"
}

### Example 3
- User Query: "How do solar panels work?"
- Output:
{
"origin_question": "How do solar panels work?",
"preferred_sources": ["WIKIPEDIA", "WEB_SEARCH"],
"search_query": "solar panel photovoltaic technology mechanism"
}

## Instructions:
- Carefully analyze the user's query
- Select the MOST APPROPRIATE source(s)
- Create a targeted search query
- Return ONLY the JSON without additional text
- Regarding things like new technologies like blockchain or artifical intelligence or recent scientific discoveries you should always use WEB_SEARCH
- Regarding things like historical events or consolidated scientific knowledge you should always use WIKIPEDIA

User Query:
###REPLACE-B###
78 changes: 78 additions & 0 deletions tools/deep-research/assets/statement_extractor.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Statement Extraction Prompt

You're an expert at extracting facts from a source page. It has been commended to you to extract facts from the source page that are helpful to answer the original question.
Original Question: ###REPLACE-C###
You will be given a source with the following fields:
- id: number - Unique identifier for the source
- url: string - URL of the source page
- title: string - Title of the source page
- markdown: string - Full text content of the source page

###REPLACE-D###

# Fact Extraction Instructions

You will be given the contents of the provided source page. Your job is to extract the facts that are helpful to answer the original question.
Please format the facts that will be extracted in an array of objects with the following JSON structure.
## Output JSON Structure
```json
{
"sourceId": "number - ID of the source",
"sourceTitle": "string - Title of the source",
"extractedFacts": [
{
"statement": "string - Verbatim text from the source",
"relevance": "string - One of ['DIRECT_ANSWER', 'HIGHLY_RELEVANT', 'SOMEWHAT_RELEVANT', 'TANGENTIAL', 'NOT_RELEVANT']"
}
]
}
```

## Relevance Classification Guide:
- DIRECT_ANSWER:
- Completely and precisely addresses the original question
- Contains the core information needed to fully respond
- Minimal to no additional context required

- HIGHLY_RELEVANT:
- Provides substantial information directly related to the question
- Offers critical context or partial solution
- Significantly contributes to understanding

- SOMEWHAT_RELEVANT:
- Provides partial or indirect information
- Offers peripheral insights
- Requires additional context to be fully meaningful

- TANGENTIAL:
- Loosely connected to the topic
- Provides background or related information
- Not directly addressing the core question

- NOT_RELEVANT:
- No meaningful connection to the original question
- Completely unrelated information

## Extraction Guidelines:
1. Read the entire source document carefully
2. Extract EXACT quotes that:
- Are actually helpful answering the provided question
- Are stated verbatim from the source or are rephrased in such a way that doesn't distort the meaning in the original source
- Represent complete thoughts or meaningful segments
3. Classify each extracted fact with its relevance level
4. Preserve original context and nuance

## Critical Rules:
- try NOT to paraphrase or modify the original text. If you can't find a direct quote or you think the found quote is too long, you can paraphrase it.
- Avoid any text in the "statement" field that is not helpful answering the provided question like javascript, URLs, HTML, and other non-textual content
- Extract statements as they appear in the source and ONLY if they are helpful answering the provided question
- Include full sentences or meaningful text segments
- Preserve original formatting and punctuation
- Sort extracted facts by relevance (DIRECT_ANSWER first)
- Output JSON without code block tags, or without any escape characters or any text that is not JSON or my system will crash.

## Processing Instructions:
- Analyze the entire document systematically
- Be comprehensive in fact extraction
- Err on the side of inclusion when in doubt
- Focus on factual, informative statements
125 changes: 125 additions & 0 deletions tools/deep-research/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
{
"name": "Deep Research Engine",
"homepage": "https://github.com/dcSpark/shinkai-tools/blob/main/tools/deepresearch-engine/README.md",
"description": "This function takes a question as input and returns a comprehensive answer, along with the sources and statements used to generate the answer.",
"author": "Shinkai",
"version": "1.0.0",
"keywords": [
"search",
"answer generation",
"fact extraction",
"wikipedia",
"google"
],
"runner": "any",
"operating_system": [
"linux",
"macos",
"windows"
],
"tool_set": "",
"configurations": {
"type": "object",
"properties": {
"searchEngine": {
"type": "string",
"description": "The search engine to use",
"default": "google"
},
"searchEngineApiKey": {
"type": "string",
"description": "The API key for the search engine",
"default": ""
},
"maxSources": {
"type": "number",
"description": "The maximum number of sources to return",
"default": 10
}
},
"required": []
},
"parameters": {
"properties": {
"question": {
"description": "The question to answer",
"type": "string"
}
},
"required": [
"question"
],
"type": "object"
},
"result": {
"properties": {
"response": {
"description": "The generated answer",
"type": "string"
},
"sources": {
"description": "The sources used to generate the answer",
"items": {
"type": "object",
"properties": {
"id": {
"type": "number"
},
"url": {
"type": "string"
},
"title": {
"type": "string"
}
}
},
"type": "array"
},
"statements": {
"description": "The statements extracted from the sources",
"items": {
"type": "object",
"properties": {
"sourceId": {
"type": "number"
},
"sourceTitle": {
"type": "string"
},
"extractedFacts": {
"type": "array",
"items": {
"type": "object",
"properties": {
"statement": {
"type": "string"
},
"relevance": {
"type": "string"
}
}
}
}
}
},
"type": "array"
}
},
"required": [
"response",
"sources",
"statements"
],
"type": "object"
},
"sqlTables": [],
"sqlQueries": [],
"tools": [
"local:::__official_shinkai:::google_search",
"local:::__official_shinkai:::duckduckgo_search",
"local:::__official_shinkai:::shinkai_llm_prompt_processor",
"local:::__official_shinkai:::shinkai_llm_map_reduce_processor",
"local:::__official_shinkai:::download_pages",
"local:::__official_shinkai:::shinkai_sqlite_query_executor"
]
}
Loading

0 comments on commit 61484c6

Please sign in to comment.