- PseudoPack
“PseudoPack is a software library for numerical differentiation by pseudospectral methods. This library provides subroutines for computing the derivative of the Fourier collocation methods for periodical domain and Chebyshev and Legendre collocation methods for the non-periodical domain. State-of-the-art numerical techniques such as Even-Odd Decomposition, and specialized fast algorithms are employed to increase the efficiency of the library. Kosloff-Tal-Ezer Mapping is used for reducing roundoff error in the Chebyshev and Legendre collocation methods. Moreover, highly accurate Lagrange polynomial interpolation is used when forming the differentiation matrices decreasing solution contamination by roundoff error when dealing with a large number of grid points. Other routines for filtering and grid mapping are also included.
Since the user is shielded from any coding errors in one of the main computational kernels, reliability of the solution is enhanced. PseudoPack will speed up code development, increase productivity and enhance re-usability.
The library contains several simple user callable subroutines that return the derivatives and/or filtering (smoothing) of, possibly multi-dimensional, data sets. In term of flexibility and user interaction, any aspect of the library can be modified by a simple change of a small set of input parameters.
The source codes of the library are written in Fortran 90.
Using the macro and conditional capability of C Preprocessor, this software package can be compiled into several versions with several different computational platforms. Several popular computational platforms (IBM RS6000, SGI Cray, SGI, SUN) are supported to take advantages of any existing optimized native library such as General Matrix-Matrix Multiply (GEMM) from Basic Linear Algebra Level 3 Subroutine (BLAS 3), Fast Fourier Transform (FFT) and Fast Cosine/Sine Transform (CFT/SFT).”
- chebfun
“The chebfun project is a collection of algorithms, and a software system in object-oriented MATLAB, which extends familiar powerful methods of numerical computation involving numbers to continuous or piecewise-continuous functions. It also implements continuous analogues of linear algebra notions like the QR decomposition and the SVD. The mathematical basis of the system combines tools of Chebyshev expansions, fast Fourier transform, barycentric interpolation, Clenshaw-Curtis quadrature, and recursive zerofinding.”
- MATLAB Differentiation Matrix Suite
“It includes functions for computing differentiation matrices of arbitrary order corresponding to Chebyshev, Hermite, Laguerre, Fourier, and sinc interpolants. It also includes FFT-based routines for Fourier, Chebyshev and sinc differentiation. Auxiliary functions are included for incorporating boundary conditions, performing interpolation using barycentric formulas, and computing roots of orthogonal polynomials.”
- EigTool
“Eigenvalue analysis of non-hermitian matrices and operators can be misleading: Predictions often fail to match observations. Specifically, trouble may arise when the associated sets of eigenvectors are ill-conditioned with respect to the norm of applied interest. In the case of the familiar Euclidean or 2-norm, this means that the matrix or operator is non-normal, and the eigenvectors are not orthogonal. Pseudospectra provide an analytical and graphical alternative for investigating non-normal matrices and operators.”
- INTLAB
“INTLAB is the MATLAB toolbox for self-validating algorithms. It comprises of interval arithmetic for real and complex data including vectors and matrices (very fast), interval arithmetic for real and complex sparse matrices (very fast), automatic differentiation (forward mode, vectorized computations, fast), Gradients to solve systems of nonlinear equations, Hessians for global optimization, automatic slopes (sequential approach, slow for many variables), univariate and multivariate (interval) polynomials, rigorous real interval standard functions (fast, very accurate, ~3 ulps), rigorous complex interval standard functions (fast, rigorous, but not necessarily sharp inclusions), rigorous input/output, accurate summation, dot product and matrix-vector residuals (interpreted, but fairly fast), multiple precision interval arithmetic with error bounds (does the job, slow).”
- Variable Precision Integer Arithmetic
“Arithmetic with integers of fully arbitrary size. Arrays and vectors of vpi numbers are supported.”
- MPSpack
“MPSpack is a user-friendly and fully object-oriented MATLAB toolbox that implements the method of particular solutions, nonpolynomial FEM, and related boundary methods (e.g. fundamental solutions, layer potentials) for efficient and highly accurate solution of Laplace eigenvalue problems, interior/exterior Helmholtz boundary-value problems (e.g. wave scattering), and related PDE problems, on piecewise-homogeneous 2D domains.”
- MAGMA
“The Matrix Algebra on GPU and Multicore Architectures (MAGMA) project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current “Multicore+GPU” systems.
The MAGMA research is based on the idea that, to address the complex challenges of the emerging hybrid environments, optimal software solutions will themselves have to hybridize, combining the strengths of different algorithms within a single framework. Building on this idea, we aim to design linear algebra algorithms and frameworks for hybrid manycore and GPUs systems that can enable applications to fully exploit the power that each of the hybrid components offers.”
- PLASMA
“The Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) project aims to address the critical and highly disruptive situation that is facing the Linear Algebra and High Performance Computing community due to the introduction of multi-core architectures.
PLASMA’s ultimate goal is to create software frameworks that enable programmers to simplify the process of developing applications that can achieve both high performance and portability across a range of new architectures.
The development of programming models that enforce asynchronous, out of order scheduling of operations is the concept used as the basis for the definition of a scalable yet highly efficient software framework for Computational Linear Algebra applications.”
- CUDPP
“CUDPP is a library of data-parallel algorithm primitives such as parallel prefix-sum (“scan”), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.”
- cudasm, decuda
“cudasm is an assembler for the NVIDIA G8x architecture of Graphics Processing Units (GPUs). It allows writing and optimizing code specifically for the G8x and G9x series, and provides a (basic) independent toolchain for this hardware. It takes a text file with assembly instructions as input, and produces a .cubin file as output.
decuda is a disassembler for the NVIDIA CUDA binary (.cubin) format. It provides insight into the internal instructions generated for the G8x and G9x architectures. Also, it can help in finding bottlenecks, as you can see what parts of your algorithm require a lot of actual instructions. It has an option to generate a format that is compatible with cudasm to make it possible to hand-optimize kernels.”
- SpMV4GPU
“SpMV4GPU is a sparse matrix-vector multiplication library optimized for the NVIDIA GPUs. It is developed using the NVIDIA CUDA interfaces, and works on all NVIDIA GPUs that support this library. SpMV4GPU uses the standard sparse matrix storage formats, such as compressed row and column storage formats. It hides the intricacies of GPU programming by using an abstract interface. The SpMV4GPU interface also allows users to provide optional performance hints, and optionally use special storage representations.
The SpMV4GPU package is provided as a library in a binary format, currently for x86/Linux. The application invokes the library and links it using the CUDA runtime system provided by NVIDIA. The package supports all versions of CUDA above version 2.0.
This CUDA implementation of the SpMV kernel employs both compile-time and run-time optimizations. The compile-time optimizations include:
1. Exploiting synchronization-free parallelism
2. Optimized thread mapping based on the affinity towards optimal memory access pattern
3. Optimal off-chip memory access to tolerate the high latency
4. Exploiting data reuse.The runtime optimizations involve a runtime inspection of the sparse matrix to determine dense non-zero sub-blocks, which facilitate the reuse of input vector elements during execution. Internally, the library uses a new blocked storage format for storing and accessing elements of a sparse matrix in an optimized manner from the GPU memories. These optimizations result in performance improvement by a factor of two to four over the NVIDIA CUDPP library.”
- Ocelot
“Ocelot is a dynamic compilation framework for heterogeneous systems, accomplishing this by providing various backend targets for CUDA programs. Ocelot currently allows CUDA programs to be executed on NVIDIA GPUs and x86-CPUs at full speed without recompilation.”
- CULA
“CULA is EM Photonics’ GPU-accelerated numerical linear algebra library that contains a growing list of LAPACK functions.”
The basic edition (free) contains the most popular linear algebra routines including LU Decomposition, QR Factorization, Singular Value Decomposition, and Least Squares, all in single precision.
- GPUmat
GPU toolbox for MATLAB
- GMAC
“GMAC is a user-level library that implements an Asymmetric Distributed Shared Memory (ADSM) model to be used by CUDA programs. An ADSM model allows CPU code to access data hosted in accelerator (GPU) memory.”
- Cilk++
“The Intel Cilk++ SDK is an extension to C++ that offers a quick, easy and reliable way to improve the performance of C++ programs on multicore processors. The Cilk++ suite, acquired from Cilk Arts in August, 2009, offers support for programmers using the GCC compiler for Linux or the Microsoft C++ compiler for Windows. Cilk++ includes compiler support, runtime libraries, the Cilkscreen Race Detector and the Cilkview Scalability Analysis and Performance Tuning tools. The three Cilk++ keywords provide a simple yet surprisingly powerful model for parallel programming, while runtime and template libraries offer a well-tuned environment for building parallel applications.”
Here are three related video lectures taped at MIT:
1. Multicore Programming Workshop – Lecture 1
2. Concepts in Multicore Programming – Lecture 2: Parallelism and Scheduling Theory
3. Concepts in Multicore Programming – Lecture 3: Analysis of Multithreaded Algorithms - pMatlab
“pMatlab provides a set of MATLAB data structures and functions that implement distributed MATLAB arrays. Parallel array programming has proven to be an effective programming style for a wide variety of parallel applications and is consistent with standard MATLAB programming style. The primary advantages of distributed array programming are:
Message passing is done implicitly
Existing Matlab program can be made parallel with modifications to a handful of statements” - Condor
“Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.
While providing functionality similar to that of a more traditional batch queueing system, Condor’s novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a “Beowulf” cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines – if no shared file system is available, Condor can transfer the job’s data files on behalf of the user, or Condor may be able to transparently redirect all the job’s I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization’s computational power into one resource.”
- SuiteSparse
“AMD: symmetric approximate minimum degree
BTF: permutation to block triangular form
CAMD: symmetric approximate minimum degree
CCOLAMD: constrained column approximate minimum degree
COLAMD: column approximate minimum degree
CHOLMOD: sparse supernodal Cholesky factorization and update/downdatev CSparse: a concise sparse matrix package
CXSparse: an extended version of CSparse
KLU: sparsefactorization, for circuit simulation
LDL: a simplefactorization
UMFPACK: sparse multifrontalfactorization
RBio: MATLAB toolbox for reading/writing sparse matrices
UFconfig: common configuration for all but CSparse
LINFACTOR: solveusing
or chol
MESHND: 2D and 3D mesh generation and nested dissection
SSMULT: sparse matrix times sparse matrix
SuiteSparseQR: multifrontal sparse”
- deal.II
“deal.II is a C++ program library targeted at the computational solution of partial differential equations using adaptive finite elements. It uses state-of-the-art programming techniques to offer you a modern interface to the complex data structures and algorithms required.”
Here are some of the main features of deal.II:
“Support for one, two, and three space dimensions, using a unified interface that allows to write programs almost dimension independent.
Handling of locally refined grids, including different adaptive refinement strategies based on local error indicators and error estimators.
,
, and
refinements are fully supported for continuous and discontinuous elements.
Support for a variety of finite elements: Lagrange elements of any order, continuous and discontinuous; Nedelec and Raviart-Thomas elements of any order; elements composed of other elements.
Fast algorithms that enable you to solve problems with up to several millions of degrees of freedom quickly. As opposed to programming symbolic algebra packages the penalty for readability is low.
A complete stand-alone linear algebra library including sparse matrices, vectors, Krylov subspace solvers, support for blocked systems, and interface to other packages such as PETSc and METIS.”
- NekTar
Navier-Stokes Solver
- Semtex
Semtex is a `classical’ quadrilateral spectral element DNS code that uses the standard nodal GLL basis functions and (optionally) Fourier expansions in a homogeneous direction to provide three-dimensional solutions.”
- Triangle
“Triangle generates exact Delaunay triangulations, constrained Delaunay triangulations, conforming Delaunay triangulations, Voronoi diagrams, and high-quality triangular meshes. The latter can be generated with no small or large angles, and are thus suitable for finite element analysis.”
- DistMesh
“DistMesh is a simple MATLAB code for generation of unstructured triangular and tetrahedral meshes. One reason that the code is short and simple is that the geometries are specified by Signed Distance Functions. These give the shortest distance from any point in space to the boundary of the domain. The sign is negative inside the region and positive outside. A simple example is the unit circle in 2-D, which has the distance function
, where r is the distance from the origin. For more complicated geometries the distance function can be computed by interpolation between values on a grid, a common representation for level set methods. For the actual mesh generation, DistMesh uses the Delaunay triangulation routine in MATLAB and tries to optimize the node locations by a force-based smoothing procedure. The topology is regularly updated by Delaunay. The boundary points are only allowed to move tangentially to the boundary by projections using the distance function. This iterative procedure typically results in very well-shaped meshes.”
- ALGLIB
“ALGLIB is a cross-platform numerical analysis and data processing library. ALGLIB aims to be highly portable: it supports several programming languages (C++, C# and other languages); it may be compiled with a wide variety of compilers and was tested under a wide variety of platforms. Every algorithm is represented by programs in several programming languages and the list of languages is the same for all the algorithms. This is the main difference between ALGLIB and other libraries: one algorithm, several programming languages, identical functionality in each language.”
- GSL
“The GNU Scientific Library (GSL) is a collection of routines for numerical computing. The routines have been written from scratch in C, and present a modern Applications Programming Interface (API) for C programmers, allowing wrappers to be written for very high level languages. The source code is distributed under the GNU General Public License.
The library covers a wide range of topics in numerical computing. Routines are available for the following areas:
Complex Numbers, Roots of Polynomials, Special Functions, Vectors and Matrices, Permutations, Combinations, Sorting, BLAS Support, Linear Algebra, CBLAS Library, Fast Fourier Transforms, Eigensystems, Random Numbers, Quadrature, Random Distributions, Quasi-Random Sequences, Histograms, Statistics, Monte Carlo Integration, N-Tuples, Differential Equations, Simulated Annealing, Numerical Differentiation, Interpolation, Series Acceleration, Chebyshev Approximations, Root-Finding, Discrete Hankel Transforms, Least-Squares Fitting, Minimization, IEEE Floating-Point, Physical Constants, Basis Splines, and Wavelets.”
- Trilinos
“The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries. In particular, the goal is to develop parallel solver algorithms and libraries within an object oriented software framework for the solution of large-scale, complex multiphysics engineering and scientific applications. Our emphasis is on developing robust, scalable algorithms in a software framework, using abstract interfaces for flexible interoperability of components while providing a full-featured set of concrete classes that implement all abstract interfaces. Trilinos uses a two-level software structure designed around collections of packages. A Trilinos package is an integral unit usually developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. Packages exist underneath the Trilinos top level, which provides a common look-and-feel, including configuration, documentation, licensing, and bug-tracking.”
- NEOS
An optimization server
- POV-Ray
“The Persistence of Vision Raytracer (POV-Ray) creates three-dimensional, photo-realistic images using a rendering technique called ray-tracing. It reads in a text file containing information describing the objects and lighting in a scene and generates an image of that scene from the view point of a camera also described in the text file. Ray-tracing is not a fast process by any means, but it produces very high quality images with realistic reflections, shading, perspective and other effects.”
- VisIt
“VisIt is a free interactive parallel visualization and graphical analysis tool for viewing scientific data on Unix and PC platforms. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images for presentations. VisIt contains a rich set of visualization features so that you can view your data in a variety of ways. It can be used to visualize scalar and vector fields defined on two- and three-dimensional (2D and 3D) structured and unstructured meshes. VisIt was designed to handle very large data set sizes in the terascale range and yet can also handle small data sets in the kilobyte range. See the table below for more details about the tool’s features.”
- ParaView
“ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or programmatically using ParaView’s batch processing capabilities.
ParaView was developed to analyze extremely large datasets using distributed memory computing resources. It can be run on supercomputers to analyze datasets of terascale as well as on laptops for smaller data.”
