Official Kaggle API is a command line utility written in Python3, but the documentation only covers command line usage and not Python usage. This post will explain how you can use the API(Version 1.5.6) within Python.

1. Installing Kaggle API

You can run pip install kaggle  to install the api. You might need to run pip install –user kaggle  on Linux or Mac if you are encountering issues with the installation.

2. Setting up the API Key

Kaggle API requires an API token. Go to the Account Tab (https://www.kaggle.com/<username>/account ) and click ‘Create API Token’.  A file named kaggle.json will be downloaded. Move this file in to ~/.kaggle/  folder in Mac and Linux or to C:\Users\<username>\.kaggle\  on windows. This is required for authentication and do not skip this step.

Alternatively, you can populate KAGGLE_USERNAME and KAGGLE_KEY  environment variables with values from kaggle.json to get the api to authenticate. Please note that environment variables have precedence over the kaggle.json file and hence setting them incorrectly will result in authentication failure even if you have correct contents in kaggle.json file.

3. Initializing and Authenticating

You can use below lines of code to get an authenticated API instance.

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()

4. Interacting with competitions

4.1 Searching competitions

# Searching competitions
# Signature: competitions_list(group=None, category=None, sort_by=None, page=1, search=None)
competitions = api.competitions_list(search='cat',category="playground")

# competitions is a list of competition objects.
# iterate though each item to access individual competition
for comp in competitions:
    print(comp.ref,comp.reward,comp.userRank,sep=',')

 

Most of the list methods in the api have a command line counter part which can be used to display the formatted results. It is not really useful for automation tasks as they dont return anything, but might be useful when exploring the api.

 

competitions_list_cli
competitions_list_cli

4.2 Listing and downloading competition Files

# List files for a competition
# Signature: competitions_data_list_files(id, **kwargs)
api.competitions_data_list_files('titanic')

# Download all files for a competition
# Signature: competition_download_files(competition, path=None, force=False, quiet=True)
api.competition_download_files('titanic')

# Download single file for a competition
# Signature: competition_download_file(competition, file_name, path=None, force=False, quiet=False)
api.competition_download_file('titanic','gender_submission.csv')

4.3 Submitting to competitions

# Signature: competition_submit(file_name, message, competition, quiet=False)
api.competition_submit('gender_submission.csv','API Submission','titanic')

4.4 Retrieving Leader Board

# Signature: competition_view_leaderboard(id, **kwargs)
leaderboard = api.competition_view_leaderboard('titanic')

5. Interacting with datasets

5.1 Searching datasets

# Signature: dataset_list(sort_by=None, size=None, file_type=None, license_name=None, tag_ids=None, search=None, user=None, mine=False, page=1, max_size=None, min_size=None) 
datasets=api.dataset_list(search='demographics',license_name='cc', file_type='csv')

# datasets is a collection of dataset
for dat in  datasets:
     print(dat.ref,dat.viewCount,dat.voteCount,sep=',')

5.2 Listing dataset files

#Signature: dataset_list_files(dataset)
# dataset string should be in format [owner]/[dataset-name]
api.dataset_list_files('avenn98/world-of-warcraft-demographics').files

5.3 Downloading Files

# Download all files of a dataset
# Signature: dataset_download_files(dataset, path=None, force=False, quiet=True, unzip=False)
api.dataset_download_files('avenn98/world-of-warcraft-demographics')

# download single file
#Signature: dataset_download_file(dataset, file_name, path=None, force=False, quiet=True)
api.dataset_download_file('avenn98/world-of-warcraft-demographics','WoW Demographics.csv')

6. Interacting with Kernels

6.1 Searching Kernels

#Signature: kernels_list(page=1, page_size=20, dataset=None, competition=None, parent_kernel=None, search=None, mine=False, user=None, language=None, kernel_type=None, output_type=None, sort_by=None)
kernels = api.kernels_list(search='titanic')
for kernel in kernels:
     print(kernel.ref,kernel.totalVotes,kernel.language,sep=',')

6.2 Retrieve a kernels output

# Retrieve output for a specified kernel
# Signature: kernels_output(kernel, path, force=False, quiet=True)
api.kernels_output('startupsci/titanic-data-science-solutions',path='.')

6.3 Get the status of the latest kernel run

# Signature: kernels_status(kernel)
api.kernels_status('startupsci/titanic-data-science-solutions')

 

6.4 Pull a kernel to local machine

# Signature: kernels_pull(kernel, path, metadata=False, quiet=True)
api.kernels_pull('startupsci/titanic-data-science-solutions',path='.')

6.5 Initialize metadata file for a kernel

# Signature: kernels_initialize(folder)
api.kernels_initialize('./demo')

6.6 Pushing a kernel to Kaggle

# Need to have a valid metadata file called 'kernel-metadata.json' in the folder
# Create one using kernels_initialize if you dont have one
# Signature: kernels_push(folder)
api.kernels_push('./demo')

 

Have any questions? Please add it as comments and I will try my best to answer them.

5 thoughts on “Kaggle API – The Missing Python Documentation”
  1. This help full. I have question, if i have previus version exist how to do like api.dataset_list_files(‘avenn98/world-of-warcraft-demographics/version/1’).files , i want to download some data from previus version. Thanks

  2. Thanks this is great info. Is there a way to retrieve your score of your most recent competition submission? Retrieving the leaderboard can get you the best score, but not the most recent score.

Leave a Reply

Your email address will not be published. Required fields are marked *