Introduction to PyTorch: Building Blocks for Deep Learning with Molecular Data

Workshop Title: Introduction to PyTorch: Building Blocks for Deep Learning with Molecular Data

We cordially invite you to the Introduction to PyTorch: Building Blocks for Deep Learning with Molecular Data workshop, part of the NHR SW Third LLM Workshop series, organized by Saarland University.

Register here https://forms.office.com/e/MAaZTA5RUm
Location: Teams Meeting
Date: June 26th
Time: 14:00 – 16:00
Instructors: Israel A. Azime and Paloma García de Herreros García

Description

This hands-on workshop provides a practical introduction to PyTorch, one of the most popular deep learning frameworks. Designed for beginners, it walks participants through the foundational steps of working with PyTorch, starting from setting up the environment to building, training, and evaluating simple neural networks, with a special focus on molecular property prediction. The session emphasizes how molecular data (such as SMILES strings or molecular graphs) can be represented and processed using modern deep learning tools.

Workshop Topics

1. Setting Up the Environment

Get hands-on guidance on setting up your development environment for molecular machine learning using PyTorch. This includes:

Installing PyTorch (CPU or GPU version) with pip or conda.
Setting up RDKit for molecular representation and feature extraction.
Installing PyTorch Geometric (PyG) for graph-based deep learning.
Working with Jupyter Notebooks or Google Colab for easy prototyping.
Downloading and organizing benchmark molecular datasets (e.g., QM9, ESOL, Tox21).

2. Getting Familiar with Molecular Data

Gain an understanding of different ways molecules are digitally represented and how to work with them:

Introduction to SMILES strings and how to convert them into usable data formats.
Generating molecular fingerprints (e.g., Morgan fingerprints) and descriptor vectors.
Creating molecular graphs: atoms as nodes, bonds as edges.
Using RDKit to parse molecular files and extract features.
Understanding dataset formats used in molecular property prediction (e.g., CSV, SDF, JSON).

3. Working on Baseline Classical Models

Start with traditional machine learning baselines to understand performance anchors:

Using scikit-learn to train models like Random Forest, XGBoost, or Ridge Regression on molecular descriptors.
Preprocessing and splitting data (train/val/test).
Evaluating models using regression metrics like MAE, RMSE, and R².
Discussion: Why deep learning might outperform classical approaches on larger datasets.

4. Simple Neural Network Approaches

Build your first neural networks in PyTorch:

Creating a feedforward neural network (MLP) using torch.nn.Module.
Feeding molecular fingerprints as input features.
Applying activation functions, dropout, and normalization.
Training using optimizers like Adam and loss functions such as MSELoss.
Logging results and visualizing performance with matplotlib.

5. Graph-Based Approaches

Dive into molecular graph learning using PyTorch Geometric:

Introduction to Graph Neural Networks (GNNs) and Message Passing Neural Networks (MPNNs).
Representing molecules as graphs using torch_geometric.data.Data.
Building a simple GCN (Graph Convolutional Network) using GCNConv.
Implementing pooling and readout layers to get fixed-size graph embeddings.
Training and evaluating GNNs on molecular datasets.

6. Using Pretrained Models for Molecular Property Prediction

Explore transfer learning and leveraging pretrained models:

Overview of existing pretrained models like ChemBERTa, D-MPNN, and MolBERT.
Loading pretrained GNNs or transformer models for fine-tuning.
Fine-tuning pretrained models on downstream tasks like solubility or toxicity prediction.
Comparing performance vs. models trained from scratch.
Tools and libraries that offer pretrained molecular models (e.g., HuggingFace, Open Graph Benchmark).

Note: The exact contents and order of the topics may be adjusted slightly.