# Kubeflow Pipelines

This guide will take you step-by-step through the process of fine-tuning BERT on
the SQuAD dataset, leveraging
[Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/) to
scale and automate the experiment in a [Kubeflow](https://www.kubeflow.org)
cluster.

There are several advantages to this approach compared to running the experiment
locally:

- 🚀 Scale your runs by leveraging more resources and more powerful machines.
- 🎏 Parallelize steps that can run independently.
- 🫙 Cache steps, such as data processing, to avoid repeating them on each run.
- 📅 Schedule recurring runs to retrain your model periodically.
- 📈 Track and visualize the experiment's configuration.
- ✨ Automate model deployment by integrating KFP with CI/CD pipelines once.

Here's how a complete pipeline looks like:

![run](images/pipeline.png)

## What you'll need

Before you start, make sure you have the following:

- A working Kubeflow deployment. Visit the
  [VirtML](https://github.com/dpoulopoulos/virtml) project page to find out how
  you can create a local Kubeflow deployment.
- A basic understanding of Kubeflow Pipelines. If you're new to KFP, check the
  [official documentation](https://www.kubeflow.org/docs/components/pipelines/).

## Procedure

1. Create a new Jupyter Notebook in your Kubeflow Notebook server. Make sure
   that the server can submit pipelines to the Kubeflow cluster.

1. Connect to the Jupyter Notebook server.

1. Launch a temrinal window and clone the repository:

    ```console
    user:~$ git clone https://github.com/dpoulopoulos/bert-qa-finetuning.git
    ```

1. Navigate to the project directory:

    ```console
    user:~$ cd bert-qa-finetuning
    ```

1. Install the required packages:

    ```console
    user:~/bert-qa-finetuning$ pip install -r requirements.txt
    ```

1. Open the `pipeline.ipynb` notebook.

1. Follow the instructions in the notebook to create, compile, and submit a
   Kubeflow Pipeline for fine-tuning BERT on the SQuAD dataset.

1. Create a TensorBoard instance to monitor the training process. You can submit
   the following YAML manifest, or use the UI to create one:

    ```yaml
    apiVersion: tensorboard.kubeflow.org/v1alpha1
    kind: Tensorboard
    metadata:
      name: bert-squad-logs
      namespace: kubeflow-user-example-com
    spec:
      logspath: pvc://bert-squad/logs
    ```

1. Access the TensorBoard instance to monitor the training process.

  ![tensorboard](images/tensorboard.png)

## Next steps

Congratulations! You've successfully created and submitted a Kubeflow Pipeline
to fine-tune BERT on the SQuAD dataset. You can now scale and automate the
experiment in your Kubeflow cluster.