ChatGoogleGenerativeAI
You can access Google's gemini
and gemini-vision
models, as well as other
generative models in LangChain through ChatGoogleGenerativeAI
class in the
@langchain/google-genai
integration package.
You can also access Google's gemini
family of models via the LangChain VertexAI and VertexAI-web integrations.
Click here to read the docs.
Get an API key here: https://ai.google.dev/tutorials/setup
You'll first need to install the @langchain/google-genai
package:
- npm
- Yarn
- pnpm
npm install @langchain/google-genai
yarn add @langchain/google-genai
pnpm add @langchain/google-genai
Usage
We're unifying model params across all packages. We now suggest using model
instead of modelName
, and apiKey
for API keys.
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { HarmBlockThreshold, HarmCategory } from "@google/generative-ai";
/*
* Before running this, you should make sure you have created a
* Google Cloud Project that has `generativelanguage` API enabled.
*
* You will also need to generate an API key and set
* an environment variable GOOGLE_API_KEY
*
*/
// Text
const model = new ChatGoogleGenerativeAI({
model: "gemini-pro",
maxOutputTokens: 2048,
safetySettings: [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
},
],
});
// Batch and stream are also supported
const res = await model.invoke([
[
"human",
"What would be a good company name for a company that makes colorful socks?",
],
]);
console.log(res);
/*
AIMessage {
content: '1. Rainbow Soles\n' +
'2. Toe-tally Colorful\n' +
'3. Bright Sock Creations\n' +
'4. Hue Knew Socks\n' +
'5. The Happy Sock Factory\n' +
'6. Color Pop Hosiery\n' +
'7. Sock It to Me!\n' +
'8. Mismatched Masterpieces\n' +
'9. Threads of Joy\n' +
'10. Funky Feet Emporium\n' +
'11. Colorful Threads\n' +
'12. Sole Mates\n' +
'13. Colorful Soles\n' +
'14. Sock Appeal\n' +
'15. Happy Feet Unlimited\n' +
'16. The Sock Stop\n' +
'17. The Sock Drawer\n' +
'18. Sole-diers\n' +
'19. Footloose Footwear\n' +
'20. Step into Color',
name: 'model',
additional_kwargs: {}
}
*/
API Reference:
- ChatGoogleGenerativeAI from
@langchain/google-genai
Tool calling
The Google GenerativeAI package as of version 0.0.23
does not allow tool schemas to contain an object with unknown properties.
The Google VertexAI package (as of version 0.0.20
) does support this pattern.
Click here for the Google VertexAI package documentation.
For example, the following Zod schema will throw an error:
const schema = z.object({
properties: z.record(z.unknown()), // Not allowed
});
or
const schema = z.record(z.unknown()); // Not allowed
Instead, you should explicitly define the properties of the object field, or use the Google VertexAI package.
import { StructuredTool } from "@langchain/core/tools";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { z } from "zod";
const model = new ChatGoogleGenerativeAI({
model: "gemini-pro",
});
// Define your tool
class FakeBrowserTool extends StructuredTool {
schema = z.object({
url: z.string(),
query: z.string().optional(),
});
name = "fake_browser_tool";
description =
"useful for when you need to find something on the web or summarize a webpage.";
async _call(_: z.infer<this["schema"]>): Promise<string> {
return "fake_browser_tool";
}
}
// Bind your tools to the model
const modelWithTools = model.bind({
tools: [new FakeBrowserTool()],
});
// Or, you can use `.bindTools` which works the same under the hood
// const modelWithTools = model.bindTools([new FakeBrowserTool()]);
const res = await modelWithTools.invoke([
[
"human",
"Search the web and tell me what the weather will be like tonight in new york. use a popular weather website",
],
]);
console.log(res.tool_calls);
/*
[
{
name: 'fake_browser_tool',
args: {
query: 'weather in new york',
url: 'https://www.google.com/search?q=weather+in+new+york'
}
}
]
*/
API Reference:
- StructuredTool from
@langchain/core/tools
- ChatGoogleGenerativeAI from
@langchain/google-genai
See the above run's LangSmith trace here
.withStructuredOutput
import { StructuredTool } from "@langchain/core/tools";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { z } from "zod";
const model = new ChatGoogleGenerativeAI({
model: "gemini-pro",
});
// Define your tool
class FakeBrowserTool extends StructuredTool {
schema = z.object({
url: z.string(),
query: z.string().optional(),
});
name = "fake_browser_tool";
description =
"useful for when you need to find something on the web or summarize a webpage.";
async _call(_: z.infer<this["schema"]>): Promise<string> {
return "fake_browser_tool";
}
}
const tool = new FakeBrowserTool();
// Bind your tools to the model
const modelWithTools = model.withStructuredOutput(tool.schema, {
name: tool.name, // this is optional
});
// Optionally, you can pass just a Zod schema, or JSONified Zod schema
// const modelWithTools = model.withStructuredOutput(
// zodSchema,
// );
const res = await modelWithTools.invoke([
[
"human",
"Search the web and tell me what the weather will be like tonight in new york. use a popular weather website",
],
]);
console.log(res);
/*
{
url: 'https://www.accuweather.com/en/us/new-york-ny/10007/night-weather-forecast/349014',
query: 'weather tonight'
}
*/
API Reference:
- StructuredTool from
@langchain/core/tools
- ChatGoogleGenerativeAI from
@langchain/google-genai
See the above run's LangSmith trace here
Multimodal support
To provide an image, pass a human message with a content
field set to an array of content objects. Each content object
where each dict contains either an image value (type of image_url) or a text (type of text) value. The value of image_url must be a base64
encoded image (e.g., data:image/png;base64,abcd124):
import fs from "fs";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { HumanMessage } from "@langchain/core/messages";
// Multi-modal
const vision = new ChatGoogleGenerativeAI({
model: "gemini-pro-vision",
maxOutputTokens: 2048,
});
const image = fs.readFileSync("./hotdog.jpg").toString("base64");
const input2 = [
new HumanMessage({
content: [
{
type: "text",
text: "Describe the following image.",
},
{
type: "image_url",
image_url: `data:image/png;base64,${image}`,
},
],
}),
];
const res2 = await vision.invoke(input2);
console.log(res2);
/*
AIMessage {
content: ' The image shows a hot dog in a bun. The hot dog is grilled and has a dark brown color. The bun is toasted and has a light brown color. The hot dog is in the center of the bun.',
name: 'model',
additional_kwargs: {}
}
*/
// Multi-modal streaming
const res3 = await vision.stream(input2);
for await (const chunk of res3) {
console.log(chunk);
}
/*
AIMessageChunk {
content: ' The image shows a hot dog in a bun. The hot dog is grilled and has grill marks on it. The bun is toasted and has a light golden',
name: 'model',
additional_kwargs: {}
}
AIMessageChunk {
content: ' brown color. The hot dog is in the center of the bun.',
name: 'model',
additional_kwargs: {}
}
*/
API Reference:
- ChatGoogleGenerativeAI from
@langchain/google-genai
- HumanMessage from
@langchain/core/messages
Gemini Prompting FAQs
As of the time this doc was written (2023/12/12), Gemini has some restrictions on the types and structure of prompts it accepts. Specifically:
- When providing multimodal (image) inputs, you are restricted to at most 1 message of "human" (user) type. You cannot pass multiple messages (though the single human message may have multiple content entries)
- System messages are not natively supported, and will be merged with the first human message if present.
- For regular chat conversations, messages must follow the human/ai/human/ai alternating pattern. You may not provide 2 AI or human messages in sequence.
- Message may be blocked if they violate the safety checks of the LLM. In this case, the model will return an empty response.