Skip to content

Commit 0c6e563

Browse files
authored
feat!: modernise project to current Deepgram Python standards (#20)
- Replace setup.py with pyproject.toml (setuptools backend, ruff/mypy/pytest config) - Add full type hints and docstrings across all source files - Add py.typed PEP 561 marker - Export ConverterException and EmptyTranscriptException from __init__ - Add CI workflow (lint + type check + test matrix on Python 3.10–3.13) - Switch release workflow to PyPI trusted publishing (OIDC, no API token) - Fix PEP 639 license classifier conflict breaking pip install on newer setuptools - Fix datetime.utcfromtimestamp() deprecation for Python 3.12+ compatibility BREAKING CHANGE: webvtt() and srt() now raise EmptyTranscriptException when the converter returns no lines; previously returned an empty string.
1 parent 20ac932 commit 0c6e563

21 files changed

Lines changed: 1645 additions & 348 deletions
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
".": "1.2.0"
3+
}

.github/release-please-config.json

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json",
3+
"release-type": "python",
4+
"bump-minor-pre-major": true,
5+
"bump-patch-for-minor-pre-major": true,
6+
"include-v-in-tag": true,
7+
"packages": {
8+
".": {
9+
"component": "deepgram-captions",
10+
"include-component-in-tag": false,
11+
"extra-files": [
12+
"deepgram_captions/_version.py"
13+
]
14+
}
15+
}
16+
}

.github/workflows/ci.yml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
lint:
11+
name: Lint & typecheck
12+
runs-on: ubuntu-latest
13+
steps:
14+
- uses: actions/checkout@v4
15+
- uses: actions/setup-python@v4
16+
with:
17+
python-version: "3.12"
18+
- name: Install dev dependencies
19+
run: pip install -e ".[dev]"
20+
- name: Ruff format check
21+
run: ruff format --check deepgram_captions/ test/
22+
- name: Ruff lint
23+
run: ruff check deepgram_captions/ test/
24+
- name: Mypy
25+
run: mypy deepgram_captions/
26+
27+
test:
28+
name: Test Python ${{ matrix.python-version }}
29+
runs-on: ubuntu-latest
30+
strategy:
31+
matrix:
32+
python-version: ["3.10", "3.11", "3.12", "3.13"]
33+
steps:
34+
- uses: actions/checkout@v4
35+
- uses: actions/setup-python@v4
36+
with:
37+
python-version: ${{ matrix.python-version }}
38+
- name: Install dev dependencies
39+
run: pip install -e ".[dev]"
40+
- name: Run tests
41+
run: pytest test/ -v

.github/workflows/release.yml

Lines changed: 41 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,55 @@
1-
# This workflow will upload a Python Package using Twine when a release is created
2-
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries
3-
4-
# This workflow uses actions that are not certified by GitHub.
5-
# They are provided by a third-party and are governed by
6-
# separate terms of service, privacy policy, and support
7-
# documentation.
8-
91
name: Release
102

113
on:
12-
release:
13-
types: [published]
4+
push:
5+
branches: [main]
6+
workflow_dispatch:
147

158
permissions:
16-
contents: read
9+
contents: write
10+
pull-requests: write
1711

1812
jobs:
19-
deploy:
13+
release-please:
14+
name: Release Please
15+
runs-on: ubuntu-latest
16+
outputs:
17+
release_created: ${{ steps.release.outputs.release_created }}
18+
tag_name: ${{ steps.release.outputs.tag_name }}
19+
steps:
20+
- uses: googleapis/release-please-action@v4
21+
id: release
22+
with:
23+
token: ${{ github.token }}
24+
config-file: .github/release-please-config.json
25+
manifest-file: .github/.release-please-manifest.json
26+
27+
publish:
28+
name: Publish to PyPI
29+
needs: release-please
30+
if: ${{ needs.release-please.outputs.release_created }}
2031
runs-on: ubuntu-latest
32+
environment:
33+
name: pypi
34+
url: https://pypi.org/p/deepgram-captions
35+
permissions:
36+
id-token: write # required for trusted publishing
2137

2238
steps:
23-
- uses: actions/checkout@v3
39+
- uses: actions/checkout@v4
40+
2441
- name: Set up Python
25-
uses: actions/setup-python@v3
42+
uses: actions/setup-python@v4
2643
with:
27-
python-version: "3.x"
28-
- name: Install dependencies
29-
run: |
30-
python -m pip install --upgrade pip
31-
pip install build
32-
- name: Update Version in _version.py
33-
run: sed -i "s/0.0.0/${{ github.event.release.tag_name }}/g" ./deepgram_captions/_version.py
44+
python-version: "3.12"
45+
46+
- name: Install build tools
47+
run: pip install --upgrade pip build
48+
3449
- name: Build package
3550
run: python -m build
36-
- name: Install twine
37-
run: python -m pip install --upgrade twine
38-
- name: Publish package
39-
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
40-
with:
41-
user: __token__
42-
password: ${{ secrets.PYPI_API_TOKEN }}
51+
52+
- name: Publish to PyPI
53+
uses: pypa/gh-action-pypi-publish@release/v1
54+
# No API token needed — uses OIDC trusted publishing.
55+
# Configure at: https://pypi.org/manage/project/deepgram-captions/settings/publishing/

CHANGELOG.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [1.2.0] - 2024-03-15
9+
10+
### Added
11+
- `pyproject.toml` as the canonical build configuration (replaces `setup.py` as the primary build definition)
12+
- `py.typed` marker file for PEP 561 compliance — fully typed package
13+
- `Makefile` with `install`, `test`, `lint`, `lint-fix`, `format`, `format-check`, `typecheck`, `check`, and `dev` targets
14+
- GitHub Actions CI workflow (`ci.yml`) running lint, type checking, and tests across Python 3.10–3.13
15+
- `ruff` for linting and formatting (replaces `black`)
16+
- `mypy` for static type checking
17+
- Full type annotations on all public APIs in `helpers.py`, `converters.py`, `webvtt.py`, and `srt.py`
18+
- Comprehensive docstrings for all public classes and functions
19+
- `SECURITY.md` with responsible disclosure policy
20+
- `CHANGELOG.md` (this file)
21+
22+
### Changed
23+
- `DeepgramConverter`, `AssemblyAIConverter`, and `WhisperTimestampedConverter` now carry full type hints
24+
- `webvtt()` and `srt()` functions are now fully typed with `Any` converter protocol
25+
- `EmptyTranscriptException` and `ConverterException` are now exported from the top-level `deepgram_captions` package
26+
- Updated classifiers to reflect Production/Stable status and Python 3.10–3.13 support
27+
- Release workflow updated to use `actions/checkout@v4` and `actions/setup-python@v4`
28+
- Release workflow version bumping now targets `pyproject.toml` instead of `_version.py` only
29+
30+
### Fixed
31+
- `chunk_array` simplified to a single list comprehension (functionally identical, more idiomatic)
32+
33+
## [1.1.0] - 2023-11-08
34+
35+
### Added
36+
- `AssemblyAIConverter` — support for AssemblyAI speech-to-text API responses
37+
- `WhisperTimestampedConverter` — support for [Whisper Timestamped](https://github.com/linto-ai/whisper-timestamped) responses (word-level timestamps required)
38+
- `replace_text_with_word()` helper to normalise `"text"` key to `"word"` for Whisper Timestamped compatibility
39+
- Documentation note clarifying that OpenAI Whisper (without word timestamps) is not supported directly; users should use Deepgram's hosted Whisper Cloud (`model=whisper`) with `DeepgramConverter`
40+
41+
### Changed
42+
- `get_lines()` on `AssemblyAIConverter` now respects `utterances` array when present, falling back to flat `words` array
43+
- `WhisperTimestampedConverter.get_lines()` processes `segments` array and applies `replace_text_with_word` normalisation
44+
45+
## [1.0.0] - 2023-10-15
46+
47+
### Added
48+
- Speaker diarisation support in `DeepgramConverter.get_lines()`: when word objects include a `"speaker"` field, caption lines break on speaker changes in addition to `line_length` limits
49+
- Speaker labels in WebVTT output using voice tags: `<v Speaker 0>text</v>`
50+
- Speaker labels in SRT output as `[speaker N]` prefix lines, emitted once per speaker change
51+
- `use_exception` parameter on `DeepgramConverter.__init__()` — set to `False` to suppress `ConverterException` when no valid transcript is found
52+
- `EmptyTranscriptException` raised by `webvtt()` and `srt()` when the converter returns an empty first line
53+
- `line_length` parameter on `webvtt()` and `srt()` — controls the maximum number of words per caption cue (default: 8)
54+
- `get_headers()` on `DeepgramConverter` returns a `NOTE` block for WebVTT output containing request ID, creation time, duration, and channel count from the Deepgram response metadata
55+
56+
### Changed
57+
- `DeepgramConverter` now prefers the `utterances` array over `channels[0].alternatives[0].words` when both are present, producing more natural sentence-level caption breaks
58+
- `webvtt()` checks for `get_headers()` capability via `hasattr`/`callable` — custom converters do not need to implement it
59+
60+
### Fixed
61+
- Microsecond precision in `seconds_to_timestamp()` correctly truncated to milliseconds for both WebVTT (`.`) and SRT (`,`) formats
62+
63+
## [0.1.0] - 2023-09-20
64+
65+
### Added
66+
- `DeepgramConverter` class wrapping Deepgram pre-recorded and streaming API responses
67+
- `webvtt()` function generating valid WebVTT documents from any converter
68+
- `srt()` function generating valid SRT documents from any converter
69+
- `seconds_to_timestamp()` utility converting float seconds to `HH:MM:SS.mmm` or `HH:MM:SS,mmm`
70+
- `chunk_array()` utility splitting word lists into fixed-length groups
71+
- `EmptyTranscriptException` for empty transcript detection
72+
- Support for Deepgram SDK response objects via `.to_json()` method detection
73+
- Initial test suite covering Deepgram pre-recorded responses
74+
75+
## [0.0.1] - 2023-08-01
76+
77+
### Added
78+
- Initial project scaffold
79+
- Package structure: `deepgram_captions/` with `__init__.py`, `helpers.py`, `converters.py`, `webvtt.py`, `srt.py`
80+
- `setup.py` with basic package metadata
81+
- MIT License
82+
- Initial README
83+
84+
[1.2.0]: https://github.com/deepgram/deepgram-python-captions/compare/v1.1.0...v1.2.0
85+
[1.1.0]: https://github.com/deepgram/deepgram-python-captions/compare/v1.0.0...v1.1.0
86+
[1.0.0]: https://github.com/deepgram/deepgram-python-captions/compare/v0.1.0...v1.0.0
87+
[0.1.0]: https://github.com/deepgram/deepgram-python-captions/compare/v0.0.1...v0.1.0
88+
[0.0.1]: https://github.com/deepgram/deepgram-python-captions/releases/tag/v0.0.1

0 commit comments

Comments
 (0)