Actions
|
|
[ Date Prev][ Date Next][ Thread Prev][ Thread Next][ Date Index][ Thread Index]
Re: [vsipl++] [patch] SAL dispatch for matrix and vector products
- To: VSIPL++ Developers List <vsipl++@xxxxxxxxxxxxxxxx>
- Subject: Re: [vsipl++] [patch] SAL dispatch for matrix and vector products
- From: Don McCoy <don@xxxxxxxxxxxxxxxx>
- Date: Mon, 14 Nov 2005 21:01:27 -0700
Here is an updated patch for SAL. In addition to the matrix-vector
products, it includes outer and gemp. Some comments are included below...
Jules Bergmann wrote:
Don McCoy wrote:
I am working on BLAS dispatch as well. Patch to follow. This one
just includes SAL. Hopefully I've separated them well.
Two issues worth pointing out:
1) The exec() function checks for row-major ordering because it
calls the newer SAL functions (mat_mul) that allow the column-stride
to be specified. I believe the rows must be unit stride. This is a
little less general than the older ones (mulx) which allow non-unit
strides. Recently, we heard back from Mercury that the older ones
perform better (at this time). Given that the older ones handle
non-unit strides and are faster, should we revert to using those? If
Mercury changes in the future, then we can follow.
Yes, we should revert to the old ones for now.
Because the old and new functions have different dispatch requirements
(for supported strides and mixing of dimension orderings), we should
have separate evaluators for each (as opposed to trying to hide the
different in sal::mmul).
I added SAL_USE_MAT_MUL for use in selecting the newer functions. The
default (0) selects the old ones. It is tempting to make this a
configuration option, but i really don't anticipate changing this switch
back and forth. I'm sure we'll just change it to a 1 when Mercury
indicates it is a good time to do so.
2) Split-complex products (other than vector-vector) are not
handled at this time. Just a reminder that we were going to discuss
how to address this issue sometime.
We should be able to handle this by:
- providing overloads of sal::mmul for std::pair<T*, T*>, and
- checking that all the matrices have the same complex format in
ct_valid.
Granted, we wont be able to fully exercise this until we get prod
integrated into the expression templates.
I added a function in tests/matvec-prod.cpp that calls 'generic_prod()'
directly with split-complex data. This lets us see that it is working
now. Until we change prod() itself, it will select the generic
evaluator that returns the data in interleaved form, regardless of the
layout of the inputs.
A note on testing: The easiest way to test the dispatch mechanism was
to insert debugging statements into the code in order to observe the SAL
functions being called. The code in tests/matvec-prod.cpp walks through
a very large assortment of types, ordering and matrix sizes. Checking
that SAL is used everywhere it could be is a manual process that is
likely to be error prone. The hope is that I did not miss any cases
where SAL could be used (e.g. an error in the rt_valid() function caused
the generic evaluator to be selected instead). Of course, cases where
SAL is used where it *shouldn't* be are already taken care of. :)
Coming at this from another angle, we could test each library for
certain capabilities and make sure it was used where/when it should be.
The only way i can think to do this would involve inserting code that
would allow us to detect (after a call to the dispatch mechanism)
/which/ evaluator succeeded. It may be too much work for not enough
benefit, but I thought I'd throw the idea out there just in case.
Regards,
--
Don McCoy
CodeSourcery, LLC
2005-11-14 Don McCoy <don@xxxxxxxxxxxxxxxx>
* tests/matvec-prod.cpp: added tests for special cases such
as split complex layout and subviews (different strides).
* src/vsip/impl/eval-sal.hpp: new file. dispatch routines
for matrix/vector products, outer and gemp.
* src/vsip/impl/matvec-prod.hpp: include eval-sal.hpp.
* src/vsip/impl/matvec.hpp: include eval-sal.hpp and math-enum.hpp.
* src/vsip/impl/sal.hpp: added new overloaded translation
functions for matrix/vector products, outer and gemp.
Index: tests/matvec-prod.cpp
===================================================================
RCS file: /home/cvs/Repository/vpp/tests/matvec-prod.cpp,v
retrieving revision 1.4
diff -c -p -r1.4 matvec-prod.cpp
*** tests/matvec-prod.cpp 11 Nov 2005 00:07:59 -0000 1.4
--- tests/matvec-prod.cpp 15 Nov 2005 03:26:32 -0000
*************** test_prod_rand(length_type m, length_typ
*** 165,170 ****
--- 165,281 ----
}
+
+ template <typename T>
+ void
+ test_mm_prod_subview( const length_type m,
+ const length_type n,
+ const length_type k )
+ {
+ typedef typename Matrix<T>::subview_type matrix_subview_type;
+ typedef typename vsip::impl::Scalar_of<T>::type scalar_type;
+
+ // non-unit strides - dense rows, non-dense columns
+ {
+ Matrix<T> aa(m, k*2, T());
+ Matrix<T> bb(k, n*3, T());
+ matrix_subview_type a = aa(Domain<2>(
+ Domain<1>(0, 1, m), Domain<1>(0, 2, k)));
+ matrix_subview_type b = bb(Domain<2>(
+ Domain<1>(0, 1, k), Domain<1>(0, 3, n)));
+ Matrix<T> res(m, n);
+ Matrix<T> chk(m, n);
+ Matrix<scalar_type> gauge(m, n);
+
+ randm(a);
+ randm(b);
+
+ res = prod( a, b );
+ chk = ref::prod( a, b );
+ gauge = ref::prod(mag(a), mag(b));
+
+ for (index_type i=0; i<gauge.size(0); ++i)
+ for (index_type j=0; j<gauge.size(1); ++j)
+ if (!(gauge(i, j) > scalar_type()))
+ gauge(i, j) = scalar_type(1);
+
+ check_prod( res, chk, gauge );
+ }
+
+ // non-unit strides - non-dense rows, dense columns
+ {
+ Matrix<T> aa(m*2, k, T());
+ Matrix<T> bb(k*3, n, T());
+ matrix_subview_type a = aa(Domain<2>(
+ Domain<1>(0, 2, m), Domain<1>(0, 1, k)));
+ matrix_subview_type b = bb(Domain<2>(
+ Domain<1>(0, 3, k), Domain<1>(0, 1, n)));
+
+ Matrix<T> res(m, n);
+ Matrix<T> chk(m, n);
+ Matrix<scalar_type> gauge(m, n);
+
+ randm(a);
+ randm(b);
+
+ res = prod( a, b );
+ chk = ref::prod( a, b );
+ gauge = ref::prod(mag(a), mag(b));
+
+ for (index_type i=0; i<gauge.size(0); ++i)
+ for (index_type j=0; j<gauge.size(1); ++j)
+ if (!(gauge(i, j) > scalar_type()))
+ gauge(i, j) = scalar_type(1);
+
+ check_prod( res, chk, gauge );
+ }
+ }
+
+
+
+ template <typename T>
+ void
+ test_mm_prod_complex_split( const length_type m,
+ const length_type n,
+ const length_type k )
+ {
+ typedef impl::Fast_block<2, complex<T>,
+ impl::Layout<2, row2_type,
+ impl::Stride_unit_dense,
+ impl::Cmplx_split_fmt> > split_type;
+ typedef typename vsip::impl::Scalar_of<T>::type scalar_type;
+
+ Matrix<complex<T>, split_type> a(m, k);
+ Matrix<complex<T>, split_type> b(k, n);
+ Matrix<complex<T>, split_type> res(m, n, T());
+
+ randm(a);
+ randm(b);
+
+ // call prod()'s underlying interface directly
+ impl::generic_prod(a, b, res);
+
+ // compute a reference matrix using interleaved (default) layout
+ Matrix<complex<T> > aa(m, k);
+ Matrix<complex<T> > bb(k, n);
+ aa = a;
+ bb = b;
+
+ Matrix<complex<T> > chk(m, n);
+ Matrix<scalar_type> gauge(m, n);
+ chk = ref::prod( aa, bb );
+ gauge = ref::prod( mag(aa), mag(bb) );
+
+ for (index_type i=0; i<gauge.size(0); ++i)
+ for (index_type j=0; j<gauge.size(1); ++j)
+ if (!(gauge(i, j) > scalar_type()))
+ gauge(i, j) = scalar_type(1);
+
+ check_prod( res, chk, gauge );
+ }
+
+
+
template <typename T0,
typename T1>
void
*************** prod_cases_complex_only()
*** 546,551 ****
--- 657,673 ----
}
+ void
+ prod_special_cases()
+ {
+ test_mm_prod_subview<float>(5, 7, 3);
+ test_mm_prod_subview<double>(5, 7, 3);
+
+ test_mm_prod_complex_split<float>(5, 7, 3);
+ test_mm_prod_complex_split<double>(5, 7, 3);
+ }
+
+
/***********************************************************************
*************** main(int argc, char** argv)
*** 563,568 ****
--- 685,691 ----
Precision_traits<float>::compute_eps();
Precision_traits<double>::compute_eps();
+
prod_cases<float, float>();
prod_cases<double, double>();
prod_cases<float, double>();
*************** main(int argc, char** argv)
*** 572,575 ****
--- 695,700 ----
prod_cases<complex<float>, float >();
prod_cases_complex_only<complex<float>, complex<float> >();
+
+ prod_special_cases();
}
Index: src/vsip/impl/eval-sal.hpp
===================================================================
RCS file: src/vsip/impl/eval-sal.hpp
diff -N src/vsip/impl/eval-sal.hpp
*** /dev/null 1 Jan 1970 00:00:00 -0000
--- src/vsip/impl/eval-sal.hpp 15 Nov 2005 03:26:33 -0000
***************
*** 0 ****
--- 1,959 ----
+ /* Copyright (c) 2005 by CodeSourcery, LLC. All rights reserved. */
+
+ /** @file vsip/impl/eval-sal.hpp
+ @author Don McCoy
+ @date 2005-10-17
+ @brief VSIPL++ Library: SAL evaluators (for use in general dispatch).
+
+ */
+
+ #ifndef VSIP_IMPL_EVAL_SAL_HPP
+ #define VSIP_IMPL_EVAL_SAL_HPP
+
+ #ifdef VSIP_IMPL_HAVE_SAL
+
+ /***********************************************************************
+ Included Files
+ ***********************************************************************/
+
+ #include <vsip/impl/general_dispatch.hpp>
+ #include <vsip/impl/sal.hpp>
+
+
+ // This switch chooses between two sets of matrix-multiply functions
+ // provided by SAL. Setting to '1' will select the new sal::mat_mul()
+ // variants and setting it to '0' will select the older sal::mmul()
+ // set. At present, Mecury states that the mmul() functions are
+ // faster, but the API is changing towards that used with mat_mul().
+ // See 'sal.hpp' for details as to the differences.
+
+ #define SAL_USE_MAT_MUL 0
+
+
+
+ /***********************************************************************
+ Declarations
+ ***********************************************************************/
+
+ namespace vsip
+ {
+
+ namespace impl
+ {
+
+ // SAL evaluator for vector-vector outer product
+
+ template <typename T1,
+ typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_vv_outer, Block0, Op_list_3<T1, Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T1>::valid &&
+ Type_equal<T1, typename Block1::value_type>::value &&
+ Type_equal<T1, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0;
+
+ static bool rt_valid(Block0& r, T1 alpha, Block1 const& a, Block2 const& b)
+ {
+ typedef typename Block_layout<Block1>::order_type order0_type;
+ dimension_type const r_dim1 = order0_type::impl_dim1;
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ bool unit_stride =
+ (ext_r.stride(r_dim1) == 1) &&
+ (ext_a.stride(0) == 1) &&
+ (ext_b.stride(0) == 1);
+
+ return unit_stride;
+ }
+
+ static void exec(Block0& r, T1 alpha, Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(1, 0) == r.size(2, 0));
+ assert(b.size(1, 0) == r.size(2, 1));
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ if (Type_equal<order0_type, row2_type>::value)
+ {
+ sal::mat_mul( a.size(1, 0), b.size(1, 0),
+ 1,
+ ext_a.data(), ext_a.stride(0),
+ ext_b.data(), ext_b.stride(0),
+ ext_r.data(), ext_r.stride(0),
+ 0 );
+ }
+ else if (Type_equal<order0_type, col2_type>::value)
+ {
+ // Use identity:
+ // R = A B <=> trans(R) = trans(B) trans(A)
+
+ sal::mat_mul( b.size(1, 0), a.size(1, 0),
+ 1,
+ ext_b.data(), ext_b.stride(0),
+ ext_a.data(), ext_a.stride(0),
+ ext_r.data(), ext_r.stride(1),
+ 0 );
+ }
+ else
+ assert(0);
+
+ // SAL does not support a scaling parameter
+ sal::vsmul( ext_r.data(), 1,
+ &alpha,
+ ext_r.data(), 1,
+ r.size() );
+ }
+ };
+
+
+ template <typename T1,
+ typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_vv_outer, Block0,
+ Op_list_3<std::complex<T1>, Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ typedef typename Block_layout<Block0>::complex_type complex0_type;
+ typedef typename Block_layout<Block1>::complex_type complex1_type;
+ typedef typename Block_layout<Block2>::complex_type complex2_type;
+
+ static bool const ct_valid =
+ impl::sal::Sal_traits<std::complex<T1> >::valid &&
+ Type_equal<std::complex<T1>, typename Block1::value_type>::value &&
+ Type_equal<std::complex<T1>, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0 &&
+ // check complex layout
+ Type_equal<complex0_type, complex1_type>::value &&
+ Type_equal<complex1_type, complex2_type>::value;
+
+ static bool rt_valid(Block0& r, std::complex<T1> alpha,
+ Block1 const& a, Block2 const& b)
+ {
+ typedef typename Block_layout<Block1>::order_type order0_type;
+ dimension_type const r_dim1 = order0_type::impl_dim1;
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ bool unit_stride =
+ (ext_r.stride(r_dim1) == 1) &&
+ (ext_a.stride(0) == 1) &&
+ (ext_b.stride(0) == 1);
+
+ return unit_stride;
+ }
+
+ static void exec(Block0& r, std::complex<T1> alpha,
+ Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(1, 0) == r.size(2, 0));
+ assert(b.size(1, 0) == r.size(2, 1));
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ if (Type_equal<order0_type, row2_type>::value)
+ {
+ sal::mat_mul( a.size(1, 0), b.size(1, 0),
+ 1,
+ ext_a.data(), ext_a.stride(0),
+ ext_b.data(), ext_b.stride(0),
+ ext_r.data(), ext_r.stride(0),
+ SAL_CONJUGATE_RIGHT );
+ }
+ else if (Type_equal<order0_type, col2_type>::value)
+ {
+ // Use identity:
+ // R = A B <=> trans(R) = trans(B) trans(A)
+
+ sal::mat_mul( b.size(1, 0), a.size(1, 0),
+ 1,
+ ext_b.data(), ext_b.stride(0),
+ ext_a.data(), ext_a.stride(0),
+ ext_r.data(), ext_r.stride(1),
+ SAL_CONJUGATE_LEFT );
+ }
+ else
+ assert(0);
+
+ // SAL does not support a scaling parameter
+ sal::vsmul( ext_r.data(), 1,
+ &alpha,
+ ext_r.data(), 1,
+ r.size() );
+ }
+ };
+
+
+
+ // SAL evaluator for vector-vector dot-product (non-conjugated).
+
+ template <typename T,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_vv_dot, Return_scalar<T>, Op_list_2<Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T>::valid &&
+ Type_equal<T, typename Block1::value_type>::value &&
+ Type_equal<T, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0;
+
+ static bool rt_valid(Block1 const&, Block2 const&) { return true; }
+
+ static T exec(Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(1, 0) == b.size(1, 0));
+
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ T r = sal::dot( a.size(1, 0),
+ ext_a.data(), ext_a.stride(0),
+ ext_b.data(), ext_b.stride(0) );
+
+ return r;
+ }
+ };
+
+
+
+ // SAL evaluator for vector-vector dot-product (conjugated).
+
+ template <typename T,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_vv_dot, Return_scalar<complex<T> >,
+ Op_list_2<Block1,
+ Unary_expr_block<1, impl::conj_functor,
+ Block2, complex<T> > const>,
+ Mercury_sal_tag>
+ {
+ static bool const ct_valid =
+ impl::sal::Sal_traits<complex<T> >::valid &&
+ Type_equal<complex<T>, typename Block1::value_type>::value &&
+ Type_equal<complex<T>, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0;
+
+ static bool rt_valid(
+ Block1 const&,
+ Unary_expr_block<1, impl::conj_functor, Block2, complex<T> > const&)
+ { return true; }
+
+ static complex<T> exec(
+ Block1 const& a,
+ Unary_expr_block<1, impl::conj_functor, Block2, complex<T> > const& b)
+ {
+ assert(a.size(1, 0) == b.size(1, 0));
+
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b.op()));
+
+ return sal::dotc( a.size(1, 0),
+ ext_b.data(), ext_b.stride(0),
+ ext_a.data(), ext_a.stride(0) );
+ // Note:
+ // SAL cidotprx(x, y) => conj(x) * y, while
+ // VSIPL++ cvjdot(x, y) => x * conj(y)
+ }
+ };
+
+
+
+ #if SAL_USE_MAT_MUL
+
+ // SAL evaluator for matrix-vector product
+
+ template <typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_mv, Block0, Op_list_2<Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ typedef typename Block0::value_type T;
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ typedef typename Block_layout<Block0>::complex_type complex0_type;
+ typedef typename Block_layout<Block1>::complex_type complex1_type;
+ typedef typename Block_layout<Block2>::complex_type complex2_type;
+
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T>::valid &&
+ Type_equal<T, typename Block0::value_type>::value &&
+ Type_equal<T, typename Block1::value_type>::value &&
+ Type_equal<T, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0 &&
+ // check complex layout
+ Type_equal<complex0_type, complex1_type>::value &&
+ Type_equal<complex1_type, complex2_type>::value;
+
+ static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ dimension_type const a_dim0 = order1_type::impl_dim0;
+ dimension_type const a_dim1 = order1_type::impl_dim1;
+
+ bool unit_minor_stride =
+ (ext_a.stride(a_dim0) == ext_a.size(a_dim1) * ext_a.stride(a_dim1)) &&
+ (ext_r.stride(0) == 1) &&
+ (ext_b.stride(0) == 1);
+
+ return unit_minor_stride;
+ }
+
+ static void exec(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(2, 1) == b.size(1, 0));
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ if (Type_equal<order1_type, row2_type>::value)
+ {
+ sal::mat_mul( a.size(2, 0), 1,
+ a.size(2, 1),
+ ext_a.data(), ext_a.stride(0),
+ ext_b.data(), ext_b.stride(0),
+ ext_r.data(), ext_r.stride(0),
+ 0 );
+ }
+ else if (Type_equal<order1_type, col2_type>::value)
+ {
+ sal::mat_mul( 1, a.size(2, 0),
+ a.size(2, 1),
+ ext_b.data(), ext_b.stride(0),
+ ext_a.data(), ext_a.stride(1),
+ ext_r.data(), ext_r.stride(0),
+ 0 );
+ }
+ else
+ assert(0);
+ }
+ };
+
+
+ // SAL evaluator for vector-matrix product
+
+ template <typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_vm, Block0, Op_list_2<Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ typedef typename Block0::value_type T;
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ typedef typename Block_layout<Block0>::complex_type complex0_type;
+ typedef typename Block_layout<Block1>::complex_type complex1_type;
+ typedef typename Block_layout<Block2>::complex_type complex2_type;
+
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T>::valid &&
+ Type_equal<T, typename Block0::value_type>::value &&
+ Type_equal<T, typename Block1::value_type>::value &&
+ Type_equal<T, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0 &&
+ // check complex layout
+ Type_equal<complex0_type, complex1_type>::value &&
+ Type_equal<complex1_type, complex2_type>::value;
+
+ static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ dimension_type const b_dim0 = order2_type::impl_dim0;
+ dimension_type const b_dim1 = order2_type::impl_dim1;
+
+ bool unit_minor_stride =
+ (ext_b.stride(b_dim0) == ext_b.size(b_dim1) * ext_b.stride(b_dim1)) &&
+ (ext_r.stride(0) == 1) &&
+ (ext_a.stride(0) == 1);
+
+ return unit_minor_stride;
+ }
+
+ static void exec(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(1, 0) == b.size(2, 0));
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ if (Type_equal<order2_type, row2_type>::value)
+ {
+ sal::mat_mul( 1, b.size(2, 1),
+ b.size(2, 0),
+ ext_a.data(), ext_a.stride(0),
+ ext_b.data(), ext_b.stride(0),
+ ext_r.data(), ext_r.stride(0),
+ 0 );
+ }
+ else if (Type_equal<order2_type, col2_type>::value)
+ {
+ sal::mat_mul( b.size(2, 1), 1,
+ b.size(2, 0),
+ ext_b.data(), ext_b.stride(1),
+ ext_a.data(), ext_a.stride(0),
+ ext_r.data(), ext_r.stride(0),
+ 0 );
+ }
+ else
+ assert(0);
+ }
+ };
+
+
+ // SAL evaluator for matrix-matrix products.
+
+ template <typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_mm, Block0, Op_list_2<Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ typedef typename Block0::value_type T;
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ typedef typename Block_layout<Block0>::complex_type complex0_type;
+ typedef typename Block_layout<Block1>::complex_type complex1_type;
+ typedef typename Block_layout<Block2>::complex_type complex2_type;
+
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T>::valid &&
+ Type_equal<T, typename Block0::value_type>::value &&
+ Type_equal<T, typename Block1::value_type>::value &&
+ Type_equal<T, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0 &&
+ // check dimension ordering
+ order0_type::impl_dim0 == order1_type::impl_dim0 &&
+ order1_type::impl_dim0 == order2_type::impl_dim0 &&
+ // check complex layout
+ Type_equal<complex0_type, complex1_type>::value &&
+ Type_equal<complex1_type, complex2_type>::value;
+
+ static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ dimension_type const r_dim0 = order0_type::impl_dim0;
+ dimension_type const r_dim1 = order0_type::impl_dim1;
+ dimension_type const a_dim0 = order1_type::impl_dim0;
+ dimension_type const a_dim1 = order1_type::impl_dim1;
+ dimension_type const b_dim0 = order2_type::impl_dim0;
+ dimension_type const b_dim1 = order2_type::impl_dim1;
+
+ bool unit_minor_stride =
+ (ext_r.stride(r_dim1) == 1) &&
+ (ext_a.stride(a_dim1) == 1) &&
+ (ext_b.stride(b_dim1) == 1);
+
+ return unit_minor_stride;
+ }
+
+ static void exec(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(2, 0) == r.size(2, 0));
+ assert(a.size(2, 1) == b.size(2, 0));
+ assert(b.size(2, 1) == r.size(2, 1));
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ if (Type_equal<order0_type, row2_type>::value)
+ {
+ sal::mat_mul( a.size(2, 0), b.size(2, 1),
+ b.size(2, 0),
+ ext_a.data(), ext_a.stride(0),
+ ext_b.data(), ext_b.stride(0),
+ ext_r.data(), ext_r.stride(0),
+ 0 );
+ }
+ else if (Type_equal<order0_type, col2_type>::value)
+ {
+ // use r = a b ==> trans(r) = trans(b) trans(a)
+
+ sal::mat_mul( b.size(2, 1), a.size(2, 0),
+ b.size(2, 0),
+ ext_b.data(), ext_b.stride(1),
+ ext_a.data(), ext_a.stride(1),
+ ext_r.data(), ext_r.stride(1),
+ 0 );
+ }
+ else assert(0);
+ }
+
+ };
+
+
+
+ #else // !SAL_USE_MAT_MUL
+
+ // SAL evaluator for matrix-vector product
+
+ template <typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_mv, Block0, Op_list_2<Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ typedef typename Block0::value_type T;
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ typedef typename Block_layout<Block0>::complex_type complex0_type;
+ typedef typename Block_layout<Block1>::complex_type complex1_type;
+ typedef typename Block_layout<Block2>::complex_type complex2_type;
+
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T>::valid &&
+ Type_equal<T, typename Block0::value_type>::value &&
+ Type_equal<T, typename Block1::value_type>::value &&
+ Type_equal<T, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0 &&
+ // check complex layout
+ Type_equal<complex0_type, complex1_type>::value &&
+ Type_equal<complex1_type, complex2_type>::value;
+
+ static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+
+ dimension_type const a_dim0 = order1_type::impl_dim0;
+ dimension_type const a_dim1 = order1_type::impl_dim1;
+
+ bool dense_major_dim =
+ (ext_a.stride(a_dim0) == ext_a.size(a_dim1) * ext_a.stride(a_dim1));
+
+ return dense_major_dim;
+ }
+
+ static void exec(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(2, 1) == b.size(1, 0));
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ if (Type_equal<order1_type, row2_type>::value)
+ {
+ sal::mmul( a.size(2, 0), // M
+ 1, // N
+ a.size(2, 1), // P
+ ext_a.data(), ext_a.stride(1),
+ ext_b.data(), ext_b.stride(0),
+ ext_r.data(), ext_r.stride(0) );
+ }
+ else if (Type_equal<order1_type, col2_type>::value)
+ {
+ sal::mmul( 1, // M
+ a.size(2, 0), // N
+ a.size(2, 1), // P
+ ext_b.data(), ext_b.stride(0),
+ ext_a.data(), ext_a.stride(0),
+ ext_r.data(), ext_r.stride(0) );
+ }
+ else
+ assert(0);
+ }
+ };
+
+
+ // SAL evaluator for vector-matrix product
+
+ template <typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_vm, Block0, Op_list_2<Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ typedef typename Block0::value_type T;
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ typedef typename Block_layout<Block0>::complex_type complex0_type;
+ typedef typename Block_layout<Block1>::complex_type complex1_type;
+ typedef typename Block_layout<Block2>::complex_type complex2_type;
+
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T>::valid &&
+ Type_equal<T, typename Block0::value_type>::value &&
+ Type_equal<T, typename Block1::value_type>::value &&
+ Type_equal<T, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0 &&
+ // check complex layout
+ Type_equal<complex0_type, complex1_type>::value &&
+ Type_equal<complex1_type, complex2_type>::value;
+
+ static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ dimension_type const b_dim0 = order2_type::impl_dim0;
+ dimension_type const b_dim1 = order2_type::impl_dim1;
+
+ bool dense_major_dim =
+ (ext_b.stride(b_dim0) == ext_b.size(b_dim1) * ext_b.stride(b_dim1));
+
+ return dense_major_dim;
+ }
+
+ static void exec(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(1, 0) == b.size(2, 0));
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ if (Type_equal<order2_type, row2_type>::value)
+ {
+ sal::mmul( 1, // M
+ b.size(2, 1), // N
+ b.size(2, 0), // P
+ ext_a.data(), ext_a.stride(0),
+ ext_b.data(), ext_b.stride(1),
+ ext_r.data(), ext_r.stride(0) );
+ }
+ else if (Type_equal<order2_type, col2_type>::value)
+ {
+ sal::mmul( b.size(2, 1), // M
+ 1, // N
+ b.size(2, 0), // P
+ ext_b.data(), ext_b.stride(0),
+ ext_a.data(), ext_a.stride(0),
+ ext_r.data(), ext_r.stride(0) );
+ }
+ else
+ assert(0);
+ }
+ };
+
+
+ // SAL evaluator for matrix-matrix products.
+
+ template <typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_mm, Block0, Op_list_2<Block1, Block2>,
+ Mercury_sal_tag>
+ {
+ typedef typename Block0::value_type T;
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ typedef typename Block_layout<Block0>::complex_type complex0_type;
+ typedef typename Block_layout<Block1>::complex_type complex1_type;
+ typedef typename Block_layout<Block2>::complex_type complex2_type;
+
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T>::valid &&
+ Type_equal<T, typename Block0::value_type>::value &&
+ Type_equal<T, typename Block1::value_type>::value &&
+ Type_equal<T, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0 &&
+ // check dimension ordering
+ order0_type::impl_dim0 == order1_type::impl_dim0 &&
+ order1_type::impl_dim0 == order2_type::impl_dim0 &&
+ // check complex layout
+ Type_equal<complex0_type, complex1_type>::value &&
+ Type_equal<complex1_type, complex2_type>::value;
+
+
+ static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ dimension_type const r_dim0 = order0_type::impl_dim0;
+ dimension_type const r_dim1 = order0_type::impl_dim1;
+ dimension_type const a_dim0 = order1_type::impl_dim0;
+ dimension_type const a_dim1 = order1_type::impl_dim1;
+ dimension_type const b_dim0 = order2_type::impl_dim0;
+ dimension_type const b_dim1 = order2_type::impl_dim1;
+
+ bool dense_major_dim =
+ (ext_r.stride(r_dim0) == ext_r.size(r_dim1) * ext_r.stride(r_dim1)) &&
+ (ext_a.stride(a_dim0) == ext_a.size(a_dim1) * ext_a.stride(a_dim1)) &&
+ (ext_b.stride(b_dim0) == ext_b.size(b_dim1) * ext_b.stride(b_dim1));
+
+ return dense_major_dim;
+ }
+
+ static void exec(Block0& r, Block1 const& a, Block2 const& b)
+ {
+ assert(a.size(2, 0) == r.size(2, 0));
+ assert(a.size(2, 1) == b.size(2, 0));
+ assert(b.size(2, 1) == r.size(2, 1));
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ if (Type_equal<order0_type, row2_type>::value)
+ {
+ sal::mmul( a.size(2, 0), // M
+ b.size(2, 1), // N
+ a.size(2, 1), // P
+ ext_a.data(), ext_a.stride(1),
+ ext_b.data(), ext_b.stride(1),
+ ext_r.data(), ext_r.stride(1) );
+ }
+ else if (Type_equal<order0_type, col2_type>::value)
+ {
+ // use r = a b ==> trans(r) = trans(b) trans(a)
+
+ sal::mmul( b.size(2, 1), // M
+ a.size(2, 0), // N
+ b.size(2, 0), // P
+ ext_b.data(), ext_b.stride(0),
+ ext_a.data(), ext_a.stride(0),
+ ext_r.data(), ext_r.stride(0) );
+ }
+ else assert(0);
+ }
+ };
+
+ #endif
+
+
+
+
+ template <typename T>
+ struct Mul_inv
+ {
+ typedef T result_type;
+ static result_type apply(T value)
+ {
+ return T(1) / value;
+ }
+ };
+
+
+ template <typename T>
+ struct Mul_inv<std::complex<T> >
+ {
+ typedef std::complex<T> result_type;
+ static result_type apply(std::complex<T> value)
+ {
+ return complex<T>(1, 0) / value;
+ }
+ };
+
+ /// gives real or complex multiplicative inverse, depending on value type.
+ template <typename T>
+ inline
+ typename Mul_inv<T>::result_type
+ multiplicative_inverse(T value)
+ {
+ return Mul_inv<T>::apply(value);
+ };
+
+
+
+
+ // SAL evaluator for generalized matrix-matrix products.
+
+ template <typename T1,
+ typename T2,
+ typename Block0,
+ typename Block1,
+ typename Block2>
+ struct Evaluator<Op_prod_mm, Block0, Op_list_4<T1, Block1, Block2, T2>,
+ Mercury_sal_tag>
+ {
+ typedef typename Block0::value_type T;
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ typedef typename Block_layout<Block0>::complex_type complex0_type;
+ typedef typename Block_layout<Block1>::complex_type complex1_type;
+ typedef typename Block_layout<Block2>::complex_type complex2_type;
+
+ static bool const ct_valid =
+ impl::sal::Sal_traits<T>::valid &&
+ Type_equal<T, typename Block0::value_type>::value &&
+ Type_equal<T, typename Block1::value_type>::value &&
+ Type_equal<T, typename Block2::value_type>::value &&
+ // check that direct access is supported
+ Ext_data_cost<Block0>::value == 0 &&
+ Ext_data_cost<Block1>::value == 0 &&
+ Ext_data_cost<Block2>::value == 0 &&
+ // check dimension ordering
+ order0_type::impl_dim0 == order1_type::impl_dim0 &&
+ order1_type::impl_dim0 == order2_type::impl_dim0 &&
+ // check complex layout
+ Type_equal<complex0_type, complex1_type>::value &&
+ Type_equal<complex1_type, complex2_type>::value;
+
+ static bool rt_valid(Block0& r, T1 alpha, Block1 const& a,
+ Block2 const& b, T2 beta)
+ {
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ dimension_type const r_dim0 = order0_type::impl_dim0;
+ dimension_type const r_dim1 = order0_type::impl_dim1;
+ dimension_type const a_dim0 = order1_type::impl_dim0;
+ dimension_type const a_dim1 = order1_type::impl_dim1;
+ dimension_type const b_dim0 = order2_type::impl_dim0;
+ dimension_type const b_dim1 = order2_type::impl_dim1;
+
+ bool unit_minor_stride =
+ (ext_r.stride(r_dim1) == 1) &&
+ (ext_a.stride(a_dim1) == 1) &&
+ (ext_b.stride(b_dim1) == 1);
+
+ return unit_minor_stride;
+ }
+
+ static void exec(Block0& r, T1 alpha, Block1 const& a,
+ Block2 const& b, T2 beta)
+ {
+ typedef typename Block0::value_type RT;
+
+ typedef typename Block_layout<Block0>::order_type order0_type;
+ typedef typename Block_layout<Block1>::order_type order1_type;
+ typedef typename Block_layout<Block2>::order_type order2_type;
+
+ Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+ Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+ Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+
+ // to implement C = alpha A B + beta C, the scaling factors
+ // must be implemented manually
+
+ sal::vsmul( ext_r.data(), 1,
+ &beta,
+ ext_r.data(), 1,
+ r.size() );
+
+
+ // we must fold the scaling into this prior to calling the function
+ // because of the accumulation option, this must be undone prior
+ // to returning
+ sal::vsmul( ext_a.data(), 1,
+ &alpha,
+ ext_a.data(), 1,
+ a.size() );
+
+
+ if (Type_equal<order0_type, row2_type>::value)
+ {
+ sal::mat_mul( a.size(2, 0), b.size(2, 1),
+ b.size(2, 0),
+ ext_a.data(), ext_a.stride(0),
+ ext_b.data(), ext_b.stride(0),
+ ext_r.data(), ext_r.stride(0),
+ SAL_ACCUMULATE );
+ }
+ else if (Type_equal<order0_type, col2_type>::value)
+ {
+ // Use identity:
+ // R = A B <=> trans(R) = trans(B) trans(A)
+
+ sal::mat_mul( b.size(2, 1), a.size(2, 0),
+ b.size(2, 0),
+ ext_b.data(), ext_b.stride(1),
+ ext_a.data(), ext_a.stride(1),
+ ext_r.data(), ext_r.stride(1),
+ SAL_ACCUMULATE );
+ }
+ else assert(0);
+
+
+ // undo the scaling of alpha
+ T1 one_over_alpha = multiplicative_inverse( alpha );
+
+ sal::vsmul( ext_a.data(), 1,
+ &one_over_alpha,
+ ext_a.data(), 1,
+ a.size() );
+ }
+ };
+
+
+
+ } // namespace vsip::impl
+
+ } // namespace vsip
+
+ #endif // VSIP_IMPL_HAVE_SAL
+
+ #endif // VSIP_IMPL_EVAL_SAL_HPP
Index: src/vsip/impl/matvec-prod.hpp
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/impl/matvec-prod.hpp,v
retrieving revision 1.4
diff -c -p -r1.4 matvec-prod.hpp
*** src/vsip/impl/matvec-prod.hpp 11 Nov 2005 00:08:21 -0000 1.4
--- src/vsip/impl/matvec-prod.hpp 15 Nov 2005 03:26:33 -0000
***************
*** 18,23 ****
--- 18,24 ----
#include <vsip/matrix.hpp>
#include <vsip/impl/matvec.hpp>
#include <vsip/impl/eval-blas.hpp>
+ #include <vsip/impl/eval-sal.hpp>
Index: src/vsip/impl/matvec.hpp
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/impl/matvec.hpp,v
retrieving revision 1.6
diff -c -p -r1.6 matvec.hpp
*** src/vsip/impl/matvec.hpp 11 Nov 2005 00:08:21 -0000 1.6
--- src/vsip/impl/matvec.hpp 15 Nov 2005 03:26:33 -0000
***************
*** 21,26 ****
--- 21,28 ----
#include <vsip/impl/fns_elementwise.hpp>
#include <vsip/impl/general_dispatch.hpp>
#include <vsip/impl/eval-blas.hpp>
+ #include <vsip/impl/eval-sal.hpp>
+ #include <vsip/impl/math-enum.hpp>
namespace vsip
Index: src/vsip/impl/sal.hpp
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/impl/sal.hpp,v
retrieving revision 1.1
diff -c -p -r1.1 sal.hpp
*** src/vsip/impl/sal.hpp 14 Oct 2005 14:07:45 -0000 1.1
--- src/vsip/impl/sal.hpp 15 Nov 2005 03:26:33 -0000
***************
*** 14,19 ****
--- 14,22 ----
Included Files
***********************************************************************/
+ #include <complex>
+ #include <sal.h>
+
#include <vsip/support.hpp>
#include <vsip/impl/block-traits.hpp>
#include <vsip/impl/expr_serial_evaluator.hpp>
*************** namespace impl
*** 32,37 ****
--- 35,248 ----
namespace sal
{
+ #define VSIP_IMPL_SAL_DOT( T, SAL_T, VPPFCN, SALFCN, STRIDE ) \
+ inline T \
+ VPPFCN( length_type len, \
+ T * A, stride_type A_stride, \
+ T * B, stride_type B_stride ) \
+ { \
+ T c; \
+ SALFCN( (SAL_T *) A, STRIDE * A_stride, \
+ (SAL_T *) B, STRIDE * B_stride, \
+ (SAL_T *) &c, len, 0 ); \
+ return c; \
+ }
+
+ #define VSIP_IMPL_SAL_DOT_SPLIT( T, SAL_T, VPPFCN, SALFCN, STRIDE ) \
+ inline std::complex<T> \
+ VPPFCN( length_type len, \
+ std::pair<T *, T *> const A, stride_type A_stride, \
+ std::pair<T *, T *> const B, stride_type B_stride )\
+ { \
+ T real_val, imag_val; \
+ SAL_T c = { &real_val, &imag_val }; \
+ SALFCN( (SAL_T *) &A, STRIDE * A_stride, \
+ (SAL_T *) &B, STRIDE * B_stride, \
+ (SAL_T *) &c, len, 0 ); \
+ \
+ std::complex<T> r(*c.realp, *c.imagp); \
+ return r; \
+ }
+
+ VSIP_IMPL_SAL_DOT( float, float, dot, dotprx, 1 );
+ VSIP_IMPL_SAL_DOT( double, double, dot, dotprdx, 1 );
+ VSIP_IMPL_SAL_DOT( std::complex<float>, COMPLEX, dot, cdotprx, 2 );
+ VSIP_IMPL_SAL_DOT( std::complex<double>, DOUBLE_COMPLEX, dot, cdotprdx, 2 );
+ VSIP_IMPL_SAL_DOT_SPLIT( float, COMPLEX_SPLIT, dot, zdotprx, 1 );
+ VSIP_IMPL_SAL_DOT_SPLIT( double, DOUBLE_COMPLEX_SPLIT, dot, zdotprdx, 1 );
+
+ VSIP_IMPL_SAL_DOT( std::complex<float>, COMPLEX, dotc, cidotprx, 2 );
+ VSIP_IMPL_SAL_DOT( std::complex<double>, DOUBLE_COMPLEX, dotc, cidotprdx, 2 );
+ VSIP_IMPL_SAL_DOT_SPLIT( float, COMPLEX_SPLIT, dotc, zidotprx, 1 );
+ VSIP_IMPL_SAL_DOT_SPLIT( double, DOUBLE_COMPLEX_SPLIT, dotc, zidotprdx, 1 );
+
+ #undef VSIP_IMPL_SAL_DOT
+ #undef VSIP_IMPL_SAL_DOT_SPLIT
+
+
+
+ // Functions that take col-stride parameters, where rows must be unit-stride.
+ // Also supports transpose and accumulate operations.
+
+ #define VSIP_IMPL_SAL_MAT_PROD( T, SAL_T, SALFCN ) \
+ inline void \
+ mat_mul( int nr_c, int nc_r, \
+ int n, \
+ T *a, int a_tcols, \
+ T *b, int b_tcols, \
+ T *c, int c_tcols, \
+ int mflag ) \
+ { \
+ SALFCN( (SAL_T *) a, a_tcols, \
+ (SAL_T *) b, b_tcols, \
+ (SAL_T *) c, c_tcols, \
+ nr_c, nc_r, n, \
+ mflag, 0 ); \
+ }
+
+ #define VSIP_IMPL_SAL_MAT_PROD_SPLIT( T, SAL_T, SALFCN ) \
+ inline void \
+ mat_mul( int nr_c, int nc_r, \
+ int n, \
+ std::pair<T *, T *> a, int a_tcols, \
+ std::pair<T *, T *> b, int b_tcols, \
+ std::pair<T *, T *> c, int c_tcols, \
+ int mflag ) \
+ { \
+ SALFCN( (SAL_T *) &a, a_tcols, \
+ (SAL_T *) &b, b_tcols, \
+ (SAL_T *) &c, c_tcols, \
+ nr_c, nc_r, n, \
+ mflag, 0 ); \
+ }
+
+ VSIP_IMPL_SAL_MAT_PROD( float, float, mat_mulx );
+ VSIP_IMPL_SAL_MAT_PROD( double, double, mat_muldx );
+ VSIP_IMPL_SAL_MAT_PROD( complex<float>, COMPLEX, cmat_mulx );
+ VSIP_IMPL_SAL_MAT_PROD( complex<double>, DOUBLE_COMPLEX, cmat_muldx );
+ VSIP_IMPL_SAL_MAT_PROD_SPLIT( float, COMPLEX_SPLIT, zmat_mulx );
+ VSIP_IMPL_SAL_MAT_PROD_SPLIT( double, DOUBLE_COMPLEX_SPLIT, zmat_muldx );
+
+ #undef VSIP_IMPL_SAL_MAT_PROD
+
+
+
+ // Functions that support rows with variable stride. [Note: complex versions
+ // are deprecated according to Mercury SAL documentation, but are recommended
+ // at this time for good performance -- 2005-10-23.]
+
+ #define VSIP_IMPL_SAL_PROD( T, SAL_T, SALFCN, STRIDE_X ) \
+ inline void \
+ mmul( int m, int n, int p, \
+ T *a, int as, \
+ T *b, int bs, \
+ T *c, int cs ) \
+ { \
+ SALFCN( (SAL_T *) a, STRIDE_X * as, \
+ (SAL_T *) b, STRIDE_X * bs, \
+ (SAL_T *) c, STRIDE_X * cs, \
+ m, n, p, 0 ); \
+ }
+
+ #define VSIP_IMPL_SAL_PROD_SPLIT( T, SAL_T, SALFCN ) \
+ inline void \
+ mmul( int m, int n, int p, \
+ std::pair<T *, T *> a, int as, \
+ std::pair<T *, T *> b, int bs, \
+ std::pair<T *, T *> c, int cs ) \
+ { \
+ SALFCN( (SAL_T *) &a, as, \
+ (SAL_T *) &b, bs, \
+ (SAL_T *) &c, cs, \
+ m, n, p, 0 ); \
+ }
+
+ VSIP_IMPL_SAL_PROD( float, float, mmulx, 1 );
+ VSIP_IMPL_SAL_PROD( double, double, mmuldx, 1 );
+ VSIP_IMPL_SAL_PROD( complex<float>, COMPLEX, cmmulx, 2 );
+ VSIP_IMPL_SAL_PROD( complex<double>, DOUBLE_COMPLEX, cmmuldx, 2 );
+ VSIP_IMPL_SAL_PROD_SPLIT( float, COMPLEX_SPLIT, zmmulx );
+ VSIP_IMPL_SAL_PROD_SPLIT( double, DOUBLE_COMPLEX_SPLIT, zmmuldx );
+
+ #undef VSIP_IMPL_SAL_PROD
+ #undef VSIP_IMPL_SAL_PROD_SPLIT
+
+
+
+ #define VSIP_IMPL_SAL_VSMUL( T, SAL_T, SALFCN, STRIDE_X ) \
+ inline void \
+ vsmul( T *a, int as, \
+ T *b, \
+ T *c, int cs, \
+ int n ) \
+ { \
+ SALFCN( (SAL_T *) a, STRIDE_X * as, \
+ (SAL_T *) b, \
+ (SAL_T *) c, STRIDE_X * cs, \
+ n, 0 ); \
+ }
+
+ #define VSIP_IMPL_SAL_VSMUL_SPLIT( T, SAL_T, SALFCN, STRIDE_X ) \
+ inline void \
+ vsmul( std::pair<T *, T *> a, int as, \
+ std::pair<T *, T *> b, \
+ std::pair<T *, T *> c, int cs, \
+ int n ) \
+ { \
+ SALFCN( (SAL_T *) &a, STRIDE_X * as, \
+ (SAL_T *) &b, \
+ (SAL_T *) &c, STRIDE_X * cs, \
+ n, 0 ); \
+ }
+
+ VSIP_IMPL_SAL_VSMUL( float, float, vsmulx, 1 );
+ VSIP_IMPL_SAL_VSMUL( double, double, vsmuldx, 1 );
+ VSIP_IMPL_SAL_VSMUL( complex<float>, COMPLEX, cvcsmlx, 2 );
+ VSIP_IMPL_SAL_VSMUL( complex<double>, DOUBLE_COMPLEX, cvcsmldx, 2 );
+ VSIP_IMPL_SAL_VSMUL_SPLIT( float, COMPLEX_SPLIT, zvzsmlx, 1 );
+ VSIP_IMPL_SAL_VSMUL_SPLIT( double, DOUBLE_COMPLEX_SPLIT, zvzsmldx, 1 );
+
+
+
+
+ template <typename T>
+ struct Sal_traits
+ {
+ static bool const valid = false;
+ };
+
+ template <>
+ struct Sal_traits<float>
+ {
+ static bool const valid = true;
+ static char const trans = 't';
+ };
+
+ template <>
+ struct Sal_traits<double>
+ {
+ static bool const valid = true;
+ static char const trans = 't';
+ };
+
+ template <>
+ struct Sal_traits<std::complex<float> >
+ {
+ static bool const valid = true;
+ static char const trans = 'c';
+ };
+
+ template <>
+ struct Sal_traits<std::complex<double> >
+ {
+ static bool const valid = true;
+ static char const trans = 'c';
+ };
+
+
+
+ // types used by serial-evaluator classes:
+
template <typename Type>
struct Is_type_supported
{
*************** struct Serial_expr_evaluator_base
*** 182,187 ****
--- 393,399 ----
return true;
}
};
+
} // namespace vsip::impl::sal
|