Skip to content

speech config updates#8100

Draft
milaGGL wants to merge 9 commits into
mainfrom
mila-speech-config-updates
Draft

speech config updates#8100
milaGGL wants to merge 9 commits into
mainfrom
mila-speech-config-updates

Conversation

@milaGGL
Copy link
Copy Markdown
Contributor

@milaGGL milaGGL commented May 6, 2026

No description provided.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

📝 PRs merging into main branch

Our main branch should always be in a releasable state. If you are working on a larger change, or if you don't want this change to see the light of the day just yet, consider using a feature branch first, and only merge into the main branch when the code complete and ready to be released.

@google-oss-bot
Copy link
Copy Markdown
Collaborator

The public api surface has changed for the subproject ai-logic_firebase-ai:
error: Added method com.google.firebase.ai.type.GenerationConfig.Builder.setSpeechConfig(com.google.firebase.ai.type.SpeechConfig) [AddedMethod]
error: Added field com.google.firebase.ai.type.GenerationConfig.Builder.speechConfig [AddedField]
error: Added class com.google.firebase.ai.type.MultiSpeakerVoiceConfig [AddedClass]
error: Added class com.google.firebase.ai.type.SpeakerVoiceConfig [AddedClass]
error: Added constructor com.google.firebase.ai.type.SpeechConfig(com.google.firebase.ai.type.MultiSpeakerVoiceConfig) [AddedMethod]
error: Added constructor com.google.firebase.ai.type.SpeechConfig(com.google.firebase.ai.type.MultiSpeakerVoiceConfig,String) [AddedMethod]
error: Added constructor com.google.firebase.ai.type.SpeechConfig(com.google.firebase.ai.type.Voice,String) [AddedMethod]
error: Added method com.google.firebase.ai.type.SpeechConfig.getLanguageCode() [AddedMethod]
error: Added method com.google.firebase.ai.type.SpeechConfig.getMultiSpeakerVoiceConfig() [AddedMethod]
error: Attempted to change nullability of com.google.firebase.ai.type.Voice (from NONNULL to NULLABLE) in method com.google.firebase.ai.type.SpeechConfig.getVoice() [InvalidNullConversion]
error: Attempted to change nullability of com.google.firebase.ai.type.Voice (from NONNULL to NULLABLE) in property SpeechConfig.voice [InvalidNullConversion]

Please update the api.txt files for the subprojects being affected by this change by running ./gradlew ${subproject}:generateApiTxtFile. Also perform a major/minor bump accordingly.

@milaGGL
Copy link
Copy Markdown
Contributor Author

milaGGL commented May 13, 2026

/gemini review
/gemini summary

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Hi @milaGGL, thanks for reaching out. I have reviewed the pull request and have a few observations regarding the API changes and testing.

API Changes

It looks like you've introduced MultiSpeakerVoiceConfig and updated SpeechConfig to support multi-speaker configurations. The API surface changes noted by the bot are expected given these additions. Please ensure that you have run ./gradlew :ai-logic:firebase-ai:generateApiTxtFile to update the api.txt file as requested.

Code Review Observations

  1. Maintainability: The introduction of SpeakerVoiceConfig and MultiSpeakerVoiceConfig is a clean way to handle the new requirements. The use of @kotlinx.serialization.Serializable is appropriate here.
  2. Testing: You have added comprehensive tests in GenerateContentTests.kt and LiveSessionTests.kt covering both single and multi-speaker configurations, which is excellent. The inclusion of a test case for invalid multi-speaker list sizes (testGenerateContent_speechConfig_multiSpeaker_invalidSize) is a good defensive practice.
  3. Documentation: The KDoc added to SpeakerVoiceConfig provides helpful context for users regarding supported voices.

Suggestions

  • API Compatibility: Since you are changing the nullability of Voice in SpeechConfig.getVoice(), please double-check that existing consumers of this API are prepared to handle null values to avoid NullPointerExceptions in their code.
  • Versioning: As the bot mentioned, please ensure you perform the appropriate major/minor version bump in the relevant configuration files to reflect these public API changes.

Overall, the implementation looks solid and well-tested. Once the api.txt is updated and the versioning is handled, this should be ready for further consideration.

@milaGGL
Copy link
Copy Markdown
Contributor Author

milaGGL commented May 13, 2026

/gemini review

@firebase firebase deleted a comment from google-oss-bot May 13, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for SpeechConfig within GenerationConfig, allowing users to specify voice and language settings for model responses, and adds MultiSpeakerVoiceConfig and SpeakerVoiceConfig to support multi-speaker configurations. Additionally, the TemplateImagenModel is deprecated, test utilities in AIModels.kt are refactored for better initialization, and extensive tests are added to verify the new speech features. The reviewer suggested converting the Voice class to a data class to improve consistency and leverage auto-generated methods.

Comment on lines 28 to 33
public class Voice public constructor(public val voiceName: String) {

@Serializable internal data class Internal(@SerialName("voice_name") val voiceName: String)

internal fun toInternal(): Internal {
return Internal(this.voiceName)
internal fun toInternal(): PrebuiltVoiceInternal {
return PrebuiltVoiceInternal(this.voiceName)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For consistency with other model classes like SpeakerVoiceConfig and to benefit from auto-generated equals, hashCode, and toString methods, consider converting Voice to a data class.

public data class Voice(val voiceName: String) {
  internal fun toInternal(): PrebuiltVoiceInternal {
    return PrebuiltVoiceInternal(voiceName)
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants