Lecture 6TFLOPsRoofline ModelTritonJAXfloat64Distributed Training

Engineering & Performance of SciML

From theory to production: profiling, precision, and kernel optimization

Overview

A theoretically elegant model is useless if it cannot run at scale. This capstone lecture bridges the gap between research prototypes and production deployments. We cover TFLOP counting and the Roofline model for each SciML method, the precision pitfalls that are easy to overlook (PINNs require float64 due to the autograd signal chain), the ML infrastructure stack (PyTorch vs. JAX/XLA, distributed training strategies), and how to write specialized Triton kernels to fuse the FFT-GEMM-iFFT pipeline in FNO for a 2.5× speedup.


Benchmark & Results

Setup

A100 GPU profiling of all five SciML methods

Result

PINNs: memory-bound; FNO: bandwidth-limited; NODE: compute-bound

Engineering & Performance of SciML benchmark results
Figure: Engineering & Performance of SciML — benchmark results

Lecture Slides

The full slide deck for this lecture is available as a PDF. Each slide includes speaker notes for the presenter.


Code

The annotated implementation for this lecture is in sciml_engineering_benchmark.py. All code is written in PyTorch and prioritizes clarity over cleverness.

# sciml_engineering_benchmark.py # See the attached file for the full annotated implementation. # Key classes and functions are documented with docstrings.


References

  1. [1]Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52(4), 65–76. DOI: 10.1145/1498765.1498785
  2. [2]Tillet, P., Kung, H. T., & Cox, D. (2019). Triton: An intermediate language and compiler for tiled neural network computations. MAPL 2019.
  3. [3]Anonymous (2025). High-Performance Fourier Neural Operator. arXiv preprint. arXiv: 2504.11681

Cite As

If you use this lecture material in your research or teaching, please cite the primary reference:

@misc{jing2025sciml6,
  title  = {Lecture 6: Engineering & Performance of SciML},
  author = {Jing, Cheng},
  year   = {2025},
  note   = {An Intro Course to Scientific Machine Learning, Arizona State University},
  url    = {https://jessecj.me/course/lecture-6-engineering-performance},
  howpublished = {\url{https://jessecj.me/course/lecture-6-engineering-performance}}
}