Skip to content

Shouldn't API client pass stream = True to the requests when downloading datasets? #10

@lcorcodilos

Description

@lcorcodilos

Linking to the issue of the same name in kaggle-api:

Kaggle/kaggle-cli#754

I'm raising it here since the problematic code is now maintained in kagglesdk (KaggleHttpClient.call is where I think this could be fixed).

TL;DR Not using stream=True in requests is causing entire datasets to be materialized in memory which makes it impossible to download anything of even a modest size.

I refer to the linked issue for more details but happy to expand the conversation here.

Tagging @leoauri and @i-aki-y for their visibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions