ϟ

Alexander Titterton

Here are all the papers by Alexander Titterton that you can download and read on OA.mg.
Alexander Titterton’s last known institution is . Download Alexander Titterton PDFs here.

Claim this Profile →

DOI: 10.1088/2634-4386/ad2373

Exploiting deep learning accelerators for neuromorphic workloads

Abstract Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency when performing inference with deep learning workloads. &#xD;Error backpropagation is presently regarded as the most effective method for training SNNs, but in a twist of irony, when training on modern graphics processing units (GPUs) this becomes more expensive than non-spiking networks. The emergence of Graphcore's Intelligence Processing Units (IPUs) balances the parallelized nature of deep learning workloads with the sequential, reusable, and sparsified nature of operations prevalent when training SNNs. IPUs adopt multi-instruction multi-data (MIMD) parallelism by running individual processing threads on smaller data blocks, which is a natural fit for the sequential, non-vectorized steps required to solve spiking neuron dynamical state equations. &#xD;We present an IPU-optimized release of our custom SNN Python package, snnTorch, which exploits fine-grained parallelism by utilizing low-level, pre-compiled custom operations to accelerate irregular and sparse data access patterns that are characteristic of training SNN workloads. We provide a rigorous performance assessment across a suite of commonly used spiking neuron models, and propose methods to further reduce training run-time via half-precision training. By amortizing the cost of sequential processing into vectorizable population codes, we ultimately demonstrate the potential for integrating domain-specific accelerators with the next generation of neural networks.

DOI: 10.1088/2634-4386/ad2373/v3/response1

Author response for "Exploiting deep learning accelerators for neuromorphic workloads"

DOI: 10.1609/aaai.v38i11.29087

Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models

Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel workloads and dense vector matrix multiplications. Potentially more efficient neural network models utilizing sparsity and recurrence cannot leverage the full power of SIMD processor and are thus at a severe disadvantage compared to today's prominent parallel architectures like Transformers and CNNs, thereby hindering the path towards more sustainable AI. To overcome this limitation, we explore sparse and recurrent model training on a massively parallel multiple instruction multiple data (MIMD) architecture with distributed local memory. We implement a training routine based on backpropagation though time (BPTT) for the brain-inspired class of Spiking Neural Networks (SNNs) that feature binary sparse activations. We observe a massive advantage in using sparse activation tensors with a MIMD processor, the Intelligence Processing Unit (IPU) compared to GPUs. On training workloads, our results demonstrate 5-10x throughput gains compared to A100 GPUs and up to 38x gains for higher levels of activation sparsity, without a significant slowdown in training convergence or reduction in final model performance. Furthermore, our results show highly promising trends for both single and multi IPU configurations as we scale up to larger model sizes. Our work paves the way towards more efficient, non-standard models via AI training hardware beyond GPUs, and competitive large scale SNN models.

DOI: 10.1007/jhep10(2018)064

Exploring sensitivity to NMSSM signatures with low missing transverse energy at the LHC

A bstract We examine scenarios in the Next-to-Minimal Supersymmetric Standard Model (NMSSM), where pair-produced squarks and gluinos decay via two cascades, each ending in a stable neutralino as Lightest Supersymmetric Particle (LSP) and a Standard Model (SM)-like Higgs boson, with mass spectra such that the missing transverse energy, E T miss , is very low. Performing two-dimensional parameter scans and focusing on the hadronic $$ H\to b\overline{b} $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>H</mml:mi> <mml:mo>→</mml:mo> <mml:mi>b</mml:mi> <mml:mover> <mml:mi>b</mml:mi> <mml:mo>¯</mml:mo> </mml:mover> </mml:math> decay giving a $$ b\overline{b}b\overline{b}+{E}_{\mathrm{T}}^{\mathrm{miss}} $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>b</mml:mi> <mml:mover> <mml:mi>b</mml:mi> <mml:mo>¯</mml:mo> </mml:mover> <mml:mi>b</mml:mi> <mml:mover> <mml:mi>b</mml:mi> <mml:mo>¯</mml:mo> </mml:mover> <mml:mo>+</mml:mo> <mml:msubsup> <mml:mi>E</mml:mi> <mml:mi>T</mml:mi> <mml:mi>miss</mml:mi> </mml:msubsup> </mml:math> final state we explore the sensitivity of a current LHC general-purpose jets + E T miss analysis to such scenarios.

DOI: 10.48550/arxiv.2211.10725

Intelligence Processing Units Accelerate Neuromorphic Learning

Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency when performing inference with deep learning workloads. Error backpropagation is presently regarded as the most effective method for training SNNs, but in a twist of irony, when training on modern graphics processing units (GPUs) this becomes more expensive than non-spiking networks. The emergence of Graphcore's Intelligence Processing Units (IPUs) balances the parallelized nature of deep learning workloads with the sequential, reusable, and sparsified nature of operations prevalent when training SNNs. IPUs adopt multi-instruction multi-data (MIMD) parallelism by running individual processing threads on smaller data blocks, which is a natural fit for the sequential, non-vectorized steps required to solve spiking neuron dynamical state equations. We present an IPU-optimized release of our custom SNN Python package, snnTorch, which exploits fine-grained parallelism by utilizing low-level, pre-compiled custom operations to accelerate irregular and sparse data access patterns that are characteristic of training SNN workloads. We provide a rigorous performance assessment across a suite of commonly used spiking neuron models, and propose methods to further reduce training run-time via half-precision training. By amortizing the cost of sequential processing into vectorizable population codes, we ultimately demonstrate the potential for integrating domain-specific accelerators with the next generation of neural networks.

DOI: 10.48550/arxiv.2008.09210

Studying the potential of Graphcore IPUs for applications in Particle Physics

This paper presents the first study of Graphcore's Intelligence Processing Unit (IPU) in the context of particle physics applications. The IPU is a new type of processor optimised for machine learning. Comparisons are made for neural-network-based event simulation, multiple-scattering correction, and flavour tagging, implemented on IPUs, GPUs and CPUs, using a variety of neural network architectures and hyperparameters. Additionally, a Kálmán filter for track reconstruction is implemented on IPUs and GPUs. The results indicate that IPUs hold considerable promise in addressing the rapidly increasing compute needs in particle physics.

DOI: 10.48550/arxiv.2311.04386

Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models

Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel workloads and dense vector matrix multiplications. Potentially more efficient neural network models utilizing sparsity and recurrence cannot leverage the full power of SIMD processor and are thus at a severe disadvantage compared to today's prominent parallel architectures like Transformers and CNNs, thereby hindering the path towards more sustainable AI. To overcome this limitation, we explore sparse and recurrent model training on a massively parallel multiple instruction multiple data (MIMD) architecture with distributed local memory. We implement a training routine based on backpropagation through time (BPTT) for the brain-inspired class of Spiking Neural Networks (SNNs) that feature binary sparse activations. We observe a massive advantage in using sparse activation tensors with a MIMD processor, the Intelligence Processing Unit (IPU) compared to GPUs. On training workloads, our results demonstrate 5-10x throughput gains compared to A100 GPUs and up to 38x gains for higher levels of activation sparsity, without a significant slowdown in training convergence or reduction in final model performance. Furthermore, our results show highly promising trends for both single and multi IPU configurations as we scale up to larger model sizes. Our work paves the way towards more efficient, non-standard models via AI training hardware beyond GPUs, and competitive large scale SNN models.

DOI: 10.1088/2634-4386/ad2373/v2/response1

Author response for "Exploiting deep learning accelerators for neuromorphic workloads"

DOI: 10.1007/s41781-021-00057-z

Studying the Potential of Graphcore® IPUs for Applications in Particle Physics

Abstract This paper presents the first study of Graphcore’s Intelligence Processing Unit (IPU) in the context of particle physics applications. The IPU is a new type of processor optimised for machine learning. Comparisons are made for neural-network-based event simulation, multiple-scattering correction, and flavour tagging, implemented on IPUs, GPUs and CPUs, using a variety of neural network architectures and hyperparameters. Additionally, a Kálmán filter for track reconstruction is implemented on IPUs and GPUs. The results indicate that IPUs hold considerable promise in addressing the rapidly increasing compute needs in particle physics.

Novel searches for NMSSM signatures with low missing transverse energy with the CMS detector at the LHC