A fun Flutter app where you can have real-time, voice-to-voice chats with a Large Language Model (LLM).

Sujith:) 4782a9c0b5 Project Init 1 месяц назад
android 4782a9c0b5 Project Init 1 месяц назад
assets 4782a9c0b5 Project Init 1 месяц назад
ios 4782a9c0b5 Project Init 1 месяц назад
lib 4782a9c0b5 Project Init 1 месяц назад
linux 4782a9c0b5 Project Init 1 месяц назад
macos 4782a9c0b5 Project Init 1 месяц назад
media 4782a9c0b5 Project Init 1 месяц назад
test 4782a9c0b5 Project Init 1 месяц назад
web 4782a9c0b5 Project Init 1 месяц назад
windows 4782a9c0b5 Project Init 1 месяц назад
.gitignore 4782a9c0b5 Project Init 1 месяц назад
.metadata 4782a9c0b5 Project Init 1 месяц назад
README.md 4782a9c0b5 Project Init 1 месяц назад
analysis_options.yaml 4782a9c0b5 Project Init 1 месяц назад
pubspec.lock 4782a9c0b5 Project Init 1 месяц назад
pubspec.yaml 4782a9c0b5 Project Init 1 месяц назад

README.md

Gemini Live

Hey there! 👋 Welcome to Gemini Live, a fun Flutter app where you can have real-time, voice-to-voice chats with a Large Language Model (LLM).

Getting Started

1. Clone the Repository

git clone http://106.51.72.251:3000/Terra/gemini-live-proto.git
cd gemini-live

2. Hooking Up Firebase

This app uses Firebase for its backend magic. Here's how to get it set up:

  1. Spin up a Firebase Project: If you don't have one yet, go create a new project over at the Firebase Console.
  2. Add Your Android & iOS Apps: Inside your new Firebase project, add an Android app and an iOS app. You can attempt to follow through their on-screen guides but I am sure you are better off following a medium blog through a google search - not covering that due to my skill issue of updating the readme as frequently as google does update to their firebase options / gui.
  3. Grab Those Config Files:
    • For Android: Download the google-services.json file and drop it into android/app/ directory.
    • For iOS: Download the GoogleService-Info.plist file and drop it into ios/Runner/ directory.
  4. Turn on Google AI: In the Firebase console, you should see "AI Build" or similar option in the options panel to the left. Click on it and follow the instructions until the screen provides a success UI - we use Gemini developer AI not vertex - remember to choose the right option!

3. Environment Variables

  1. Create a file called .env right in the root of your project.
  2. Fill it up with these keys & for the values, you can find it in your firebase project which you created earlier (if you did that is).
# Android
ANDROID_API_KEY="your_android_api_key"
ANDROID_APP_ID="your_android_app_id"
ANDROID_MESSAGING_SENDER_ID="your_messaging_sender_id"
ANDROID_PROJECT_ID="your_project_id"
ANDROID_STORAGE_BUCKET="your_storage_bucket"

# iOS
IOS_API_KEY="your_ios_api_key"
IOS_APP_ID="your_ios_app_id"
IOS_MESSAGING_SENDER_ID="your_messaging_sender_id"
IOS_PROJECT_ID="your_project_id"
IOS_STORAGE_BUCKET="your_storage_bucket"
IOS_BUNDLE_ID="your_ios_bundle_id"

Probably I have added APIs that are not necessarily needed, but who am I to judge - Thanks to unhelpful firebase guides, I decided to throwup everything in there and figure out a cleanup much later - since getting it working was and is the highest priority.

4. Grab Those Dependencies!

Once Firebase is sorted, let's get all the necessary packages:

flutter pub get

5. Cross your fingers!

And just like that, you're ready to roll! Run the application using:

flutter run

Peeks & Glimpses

Home Screen
Home Screen: Landing screen of app, where the next page is disabled until the right permissions is provided.

Live Conversation Screen
Live Screen: The AI conversational screen.

How Things Are Laid Out (Project Structure)

Inspired from a .NET ecosystem, the folder structure is little unconventional to a flutter dev. But I think clarity will be established once you understand the contents - Trust me it takes little to no time to get used to it.

lib/
├── application/    # Contains the screens of the app
├── infrastructure/ # Implementations that screen depend on - these are absolute implementations that application depends on.
├── services/       # The blueprint for our services - application uses as proxy, so that infrastructure can be swapped out anytime.
└── widgets/        # Handy-dandy reusable UI bits

Chatting with the AI & Its Personality (System Instructions)

As the goal of the app is definitely the lively chats that you can have with the AI, we setup the character and scenario in "system_prompt" under assets folder. This contains the details about scenario and persona of LLM to respond with.

Right now, this system prompt lives in a local text file: assets/hostage_system_instruction.txt. In this example, it makes the AI play along with a serious hostage negotiation scenario. There are couple of other examples as well, some that work really well, others which I don't want to talk about.

Fortunately after spending 2 weeks of RDR 2 in-game hours in real life, found the documentation of the gemini live API that google wants us to use. Most of the other documents which I tried guiding through had either not worked or marked as deprecated or will be deprecated soon enough: Live API Documentation

Tested Platforms

The application has been tested on the following platforms:

Platform Status
Android ❌ Untested
iOS ✅ Tested
Windows ❌ Untested
Web ❌ Untested

What's Next?

This project is still a work in progress and a playground for new ideas! Here's a sneak peek at what I might be taking it from here:

  • Fix Conversation Gaps: Address the awkward pauses and timing issues that occur between user speech and LLM responses to create a more natural conversational flow.
  • Smart Context Management: Implement playback cancellation so the LLM can catch up to the current context by interrupting older audio responses when new input arrives.
  • Thinking State Indicator: Add visual feedback in the UI to show when the user has finished speaking and the app is waiting for the LLM to process and respond.
  • More Voices!: Try more voice options (once the API lets me to) and provide them as presentable UI in home screen.
  • Language Adventures: Current jarring system_instruction is 'rajini', which I hope would sound good someday, as I work on the app in languages other than en-US