Actions

icon Post
text/html Subscribe
text/html Unsubscribe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vsipl++] [patch] SAL dispatch for matrix and vector products


  • Subject: Re: [vsipl++] [patch] SAL dispatch for matrix and vector products
  • From: Jules Bergmann <jules@xxxxxxxxxxxxxxxx>
  • Date: Thu, 27 Oct 2005 09:28:59 -0400

Don,

Don McCoy wrote:
I am working on BLAS dispatch as well. Patch to follow. This one just includes SAL. Hopefully I've separated them well.

Two issues worth pointing out:

1) The exec() function checks for row-major ordering because it calls the newer SAL functions (mat_mul) that allow the column-stride to be specified. I believe the rows must be unit stride. This is a little less general than the older ones (mulx) which allow non-unit strides. Recently, we heard back from Mercury that the older ones perform better (at this time). Given that the older ones handle non-unit strides and are faster, should we revert to using those? If Mercury changes in the future, then we can follow.

Yes, we should revert to the old ones for now.

Because the old and new functions have different dispatch requirements (for supported strides and mixing of dimension orderings), we should have separate evaluators for each (as opposed to trying to hide the different in sal::mmul).

We almost need a Venn diagram to represent the non-overlapping sets of functionality:

The old matrix-multiply
 - required all operands to have the same dimension-ordering
 - supported non-unit stride in the minor dimension
 - required the major dimension to be "dense".  I.e. the major dimension
   stride == minor dimenson size * minor dimension stride.
 - provided a special case for unit-stride minor dimension (so called
   "fast" versions)

The new matrix-multiply
 - supports mixing of dimension-ordering (via the transpose flags)
 - requires unit-stride in the minor dimensions
 - allows major dimensions to be non-dense, via the column stride.



2) Split-complex products (other than vector-vector) are not handled at this time. Just a reminder that we were going to discuss how to address this issue sometime.

We should be able to handle this by:

 - providing overloads of sal::mmul for std::pair<T*, T*>, and
 - checking that all the matrices have the same complex format in ct_valid.

Granted, we wont be able to fully exercise this until we get prod integrated into the expression templates.


More comments below on the matrix-matrix evaluator. Some may apply to the matrix-vector and vector-matrix evaluators too.

				-- Jules


------------------------------------------------------------------------


+ + + // SAL evaluator for matrix-matrix products. + + template <typename Block0,
+ 	  typename Block1,
+ 	  typename Block2>
+ struct Evaluator<Op_prod_mm, Block0, Op_list_2<Block1, Block2>,
+ 		 Mercury_sal_tag>
+ {
+   typedef typename Block0::value_type T;

You could move these typedefs here so the new ct_valid conditions below fit on a single line:
    typedef typename Block_layout<Block0>::order_type order0_type;
    ...

    typedef typename Block_layout<Block0>::complex_type complex0_type;
    ...

+ + static bool const ct_valid = + impl::sal::Sal_traits<T>::valid &&
+     Type_equal<T, typename Block1::value_type>::value &&
+     Type_equal<T, typename Block2::value_type>::value &&
+     // check that direct access is supported
+     Ext_data_cost<Block0>::value == 0 &&
+     Ext_data_cost<Block1>::value == 0 &&
+     Ext_data_cost<Block2>::value == 0;

Assuming that we're going to modify this evaluator to handle the old matrix multiply, the ct_valid should also check that all three matrices have the same dimension ordering.

Also check that all the matrices have the same complex format (this applies to both the old and new multiply).

+ + static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b)
+   {
+     typedef typename Block_layout<Block0>::order_type order0_type;
+     typedef typename Block_layout<Block1>::order_type order1_type;
+     typedef typename Block_layout<Block2>::order_type order2_type;
+ + Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+     Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+     Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+ + bool is_r_row = Type_equal<order0_type, row2_type>::value;
+     bool is_a_row = Type_equal<order1_type, row2_type>::value;
+     bool is_b_row = Type_equal<order2_type, row2_type>::value;
+ + bool unit_stride = false;
+     if ( is_r_row && is_a_row && is_b_row )
+       unit_stride = (ext_a.stride(1) == 1) && (ext_b.stride(1) == 1);
+ + return unit_stride;

For the old matrix multiply, we should check that

	dimension_type const dim0 = order0_type::impl_dim0;
	dimension_type const dim1 = order0_type::impl_dim1;

	stride(dim0) == size(dim1) * stride(dim1)

for each matrix (using impl_dim0 and impl_dim1 should make this work for both the row- and column-major cases).

+   }
+ + static void exec(Block0& r, Block1 const& a, Block2 const& b)
+   {
+     typedef typename Block0::value_type RT;
+ + typedef typename Block_layout<Block0>::order_type order0_type;
+     typedef typename Block_layout<Block1>::order_type order1_type;
+     typedef typename Block_layout<Block2>::order_type order2_type;
+ + Ext_data<Block0> ext_r(const_cast<Block0&>(r));
+     Ext_data<Block1> ext_a(const_cast<Block1&>(a));
+     Ext_data<Block2> ext_b(const_cast<Block2&>(b));
+

(For the old matrix multiply) At this point, we can assume the three matrices are either all row-major or all column-major.

+     if (Type_equal<order0_type, row2_type>::value)
+     {
If row major, compute r = a b
+       // SAL supports column-stride parameter only (rows must be unit-stride)
+       sal::mmul( a.size(2, 0), // M
+                  b.size(2, 1), // N
+                  a.size(2, 1), // P
+                  ext_a.data(), ext_a.stride(0),
+                  ext_b.data(), ext_b.stride(0),
+                  ext_r.data(), ext_r.stride(0) );
+     }

If column major, we can use relation:

	(r = a b) <=> (trans(r) = trans(b) trans(a))	

To compute r.