Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The previous posts in this series on C++ AMP array_view covered:
- Introduction to array_view and some of its key semantic aspects
- Implicit synchronization on destruction of array_views
- array_view discard_data function
- Caching and coherence policies underlying array_view implementation
In this post we will look at using array_views with staging arrays.
array_views with a staging array as data source
As described in a previous post, C++ AMP provides staging arrays for efficient data transfers between the host and accelerators. A staging array can only be accessed on the accelerator_view where it is allocated and additionally has an associated accelerator_view (indicated by the get_associated_accelerator_view method of concurrency::array) to/from which it can be copied efficiently. When using a staging array as the host memory data source for an array_view, any implicit data transfers from the staging array data source to its associated accelerator_view are fastercompared to an array_view on regular (non-staging) host memory where an extra intermediate copy to a temporary staging buffer is performed.
Staging arrays have certain limitations that you must be aware of should you choose to use them as the data source for array_views. It is NOT safe to access a staging array when a copy from (or to) that staging array is concurrently in progress. Hence, for an array_view with a staging array as its data source, any operation that may result in transfer of data from the staging array data source to its associated accelerator_view (or vice versa) must not be concurrently executed with another operation accessing the array_view on the CPU or another accelerator_view where the array_view is not already cached. Any such concurrent operations have undefined behavior (for example may cause an access violation error).
Guidelines regarding using staging array as array_view data source
Guideline A: Consider using staging arrays as your array_view data source if the view is to be accessed only on the host plus exactly one accelerator_view.
accelerator_view cpuAv = accelerator(accelerator::cpu_accelerator).default_view;
// Guideline A: Use a staging array as the data source for an array_view
// to be used in a parallel_for_each computation, for faster transfer of data
// between the CPU and the accelerator
std::vector<float> sourceVec(size);
float *hostPtr = sourceVec.data();
concurrency::array<float> sourceArray(size, cpuAv, accelerator().default_view);
float *hostPtr = sourceArray.data();
std::generate(hostPtr, hostPtr + size, rand);
// Using a staging array as the data source for the array_view
// results in faster transfer of data from the CPU to the accelerator_view
// where the parallel_for_each kernel executes
array_view<float> dataView(size, sourceVec);
array_view<float> dataView(sourceArray);
parallel_for_each(dataView.extent, [=](index<1> idx) restrict(amp) {
dataView(idx) = fast_math::cos(dataView(idx));
});
// Using a staging array as the data source for the array_view
// also results in faster transfer of data from the accelerator_view
// to the CPU
dataView.synchronize();
Guideline B: Exercise extreme caution when using array_views over staging arrays in multi-threaded CPU code that can potentially access such array_views concurrently from multiple threads. As described earlier such accesses have undefined behavior and may result in fatal errors.
accelerator_view cpuAv = accelerator(accelerator::cpu_accelerator).default_view;
concurrency::array<float> sourceArray(size, cpuAv, accelerator().default_view);
float *hostPtr = sourceArray.data();
std::generate(hostPtr, hostPtr + size, rand);
array_view<const float> sourceView(sourceArray);
array_view<float> outputView(array<float>(size));
std::vector<float> sourceCopy(size);
concurrency::task<void> t([&]() {
for (int i = 0; i < size; ++i) {
sourceCopy[i] = sourceView[i];
}
});
// Guideline B violation: An array_view over a staging array should
// not be concurrently accessed on the CPU as in the concurrency::task above
// (or another accelerator_view) with an operation that transfers data from
// the staging array to the associated_accelerator_view of the staging array
// (the parallel_for_each invocation results in such a transfer here)
parallel_for_each(sourceView.extent, [=](index<1> idx) restrict(amp) {
outputView(idx) = fast_math::cos(sourceView(idx));
});
In closing
In this post we looked at some key aspects regarding using array_views over staging arrays as their data source. Subsequent posts will dive into other functional and performance aspects of array_view - stay tuned!
I would love to hear your feedback, comments and questions below or in our MSDN forum.