Assume we want to write a templated multiplication function for matrices:
template <typename MatrixA, typename MatrixB, typename MatrixC> void mult(MatrixA const& a, MatrixB const& b, MatrixC& c) { /* ... */ }
Dense matrix multiplication is the first operation where all the techniques on this page are applied. Of course it is planned to extend other operations in the same manner.
template <typename MatrixA, typename MatrixB, typename MatrixC> struct mult_ft { void operator()(MatrixA const& a, MatrixB const& b, MatrixC& c) { /* ... */ } };
An object of this class
mult_ft<matrix_a_type, matrix_b_type, matrix_c_type> mult;
can be called like a function. Admittedly, the definition of this functor does not look very elegant. Nevertheless, it is necessary to provide composition and partial specialization whereby the impact for the user can be minimized by the techniques described below.
Remark: the suffix "_ft" stands for fully templated, in contrast to functor classes where all or part of the types are automatically instantiated, as shown in step x.
template <> struct mult_ft<matrix_a_type, matrix_b_type, matrix_c_type> { void operator()(matrix_a_type const& a, matrix_b_type const& b, matrix_c_type& c) { /* Faster code for this type triplet ... */ } };
Please note that specializations are not required to be written in the same file as the template function (i.e. by the same author) but can be added in any file that is included in the compilation unit.
By the way, this explicit form of specialization is also supported for functions (but the following techniques are not).
template <typename ValueA, typename ParaA, typename ValueB, typename ParaB, typename ValueC, typename ParaC> struct mult_ft<dense2D<ValueA, ParaA>, dense2D<ValueB, ParaB>, dense2D<ValueC, ParaC> > { void operator()(dense2D<ValueA, ParaA> const& a, dense2D<ValueB, ParaB> const& b, dense2D<ValueC, ParaC>& c) { /* Faster code for this set of type triplets ... */ } };
Again, such specializations can be added later. This becomes very handy when users define their own (matrix) types and can also provide specialized implementations for certain functions or operators which are implemented in terms of functors.
template <typename MatrixA, typename MatrixB, typename MatrixC> struct blas_mult_ft { void operator()(MatrixA const& a, MatrixB const& b, MatrixC& c) { mult_ft<MatrixA, MatrixB, MatrixC>()(a, b, c); } }; template <typename ParaA, typename ParaB, typename ParaC> struct blas_mult_ft<dense2D<double, ParaA>, dense2D<double, ParaB>, dense2D<double, ParaC> > { void operator()(const dense2D<double, ParaA>& a, const dense2D<double, ParaB>& b, dense2D<double, ParaC>& c) { /* ... _dgemm( ... only 13 arguments ...); */ } }; /* ... more specializations */
This code works but we can write it more elegantly with public inheritence:
template <typename MatrixA, typename MatrixB, typename MatrixC> struct blas_mult_ft : public mult_ft<MatrixA, MatrixB, MatrixC> {}; /* ... here come the specializations */
This program is not only shorter but can eventually reduce the compilation cost, for details look in David Abraham's book for meta-function forwarding.
template <typename MatrixA, typename MatrixB, typename MatrixC> struct blas_mult_ft : public mult_ft<MatrixA, MatrixB, MatrixC> {}; #ifdef MTL_HAS_BLAS /* ... here come the specializations */ #endif // MTL_HAS_BLAS
In case BLAS is not installed in MTL4, the programs calling the BLAS functor still work (not necessarily as fast).
In fact if you call an MTL4 functor, you are guaranteed that the operation is correctly performed. If a functor with an optimized implementation cannot handle a certain type tuple, it calls another functor that can handle it (otherwise calls yet another functor in turn that can perform the operation (otherwise ...)).
The only thing we need to do for it is to introduce a template parameter for the default functionality:
template <typename MatrixA, typename MatrixB, typename MatrixC, typename Backup= mult_ft<MatrixA, MatrixB, MatrixC> > struct blas_mult_ft : public Backup {};
The parameter for the default functor can of course have a default value, as in the example. The name "Backup" is understood that the functors implement a functionality for a certain set of type tuples. Type tuples that are not in this set are handled by the Backup functor. Theoretically, such functors can be composed arbitrarily. Since this is syntantically somewhat cumbersome we will give examples later.
template <typename Backup> struct blas_mult_t : public Backup { template <typename MatrixA, typename MatrixB, typename MatrixC> void operator()(MatrixA const& a, MatrixB const& b, MatrixC& c) { blas_mult_ft<MatrixA, MatrixB, MatrixC, Backup>()(a, b, c); } };
Before we finally come to some examples we want to introduce another template parameter. This leads us to the actual implemenation of the functors, for instance the BLAS functor:
template <typename MatrixA, typename MatrixB, typename MatrixC, typename Assign= assign::assign_sum, typename Backup= gen_dmat_dmat_mult_t<Assign> > struct gen_blas_dmat_dmat_mult_ft : public Backup {}; /* ... its specializations */ template <typename Assign= assign::assign_sum, typename Backup= gen_dmat_dmat_mult_t<Assign> > struct gen_blas_dmat_dmat_mult_t : public Backup { template <typename MatrixA, typename MatrixB, typename MatrixC> void operator()(MatrixA const& a, MatrixB const& b, MatrixC& c) { gen_blas_dmat_dmat_mult_ft<MatrixA, MatrixB, MatrixC, Assign, Backup>()(a, b, c); } };
The parameter Assign allows the realization of C= A*B, C+= A*B, and C-= A*B with the same implementation (an explanation will follow) by setting Assign respectively to assign::assign_sum, assign::plus_sum, and assign::minus_sum. At this point we focus on the composition.
The duality of fully and partially templated functors simplifies the syntax of composed functors significantly. Already the default type of the backup functor can benefit from the shorter syntax as shown in the example above.
All these functors have a Backup parameter which is by default set to the canonical implementation with iterators. The two canonical products support all combination of matrix types and their Backup parameter is only added to unify the interface.
The Backup parameter needs only be set if another then the canonical implementation is used. If you use typedefs it is advisable to work from buttom up through the list: The tiled 4 by 4 product has already the right defaults. The platform-specific version needs a non-default backup parameter. This requires also the definition of the Assign parameter because it is positioned before. We keep this combined functor type as a type definition and use it finally in the BLAS functor. Here we create directly an object of this type which can be later called like a function:
using assign::assign_sum; typedef gen_platform_dmat_dmat_mult_t<assign_sum, gen_tiling_44_dmat_dmat_mult_t> platform_mult_type; gen_blas_dmat_dmat_mult_t<assign_sum, platform_mult_type> my_mult; // ... my_mult(A, B, C);
Now we defined a functor that can handle arbitrary combinations of dense matrix types. We also specified our preferences how to compute this operation. When the compiler instantiate our functor for a given type combination it takes the first product implementation in our list that is admissible.
Return to Recursion Table of Content Proceed to Copying in MTL4
Why and How we use Functors -- MTL 4 -- Peter Gottschling and Andrew Lumsdaine
-- Generated on 19 May 2009 by Doxygen 1.5.5 -- Copyright 2007 by the Trustees of Indiana University.