diff --git a/README.md b/README.md index 7ce7462..9e1e52a 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ - Document Number: N4506 - Date: 2015-05-05 + Document Number: N4699 + Date: 2017-10-16 Revises: Project: Programming Language C++ Project Number: TS 19570 @@ -7,24 +7,22 @@ NVIDIA Corporation jhoberock@nvidia.com -# Parallelism TS Editor's Report, post-Lenexa mailing +# Parallelism TS Editor's Report, pre-Albuquerque mailing -N4505 is the latest Parallelism TS Working Draft. It contains editorial and technical changes to the Parallelism TS to apply the following revisions: +N4698 is the proposed working draft of Parallelism TS Version 2. It contains changes to the Parallelism TS as directed by the committee at the Toronto meeting, and editorial changes. - * N4274 - Relaxing Packing Rules for Exceptions Thrown by Parallel Algorithms - Proposed Wording (Revision 1) - * Feature test macro for the Parallelism TS +N4698 updates the previous draft, N4669, published in the pre-Toronto mailing. -N4505 updates the previous draft, N4407, published in the pre-Lenexa mailing. +# Technical Changes -N4507 is document N4505 reformatted as a TS document. It updates N4409, which was published in the pre-Lenexa mailing. +* Apply P0076R4 - Vector and Wavefront Policies. -## Technical Changes +# Editorial Changes -* Applied N4274, which relaxes the exception packaging rules for exceptions thrown by parallel algorithms. Additionally, changed instances of "terminates with (exception)" phrasing to "exits via (exception)", as directed by the Library Working Group. +* Reformat Table 1 - Feature Test Macro(s), to match the style of the Library Fundamentals TS. -* Introduced the feature test macro `__cpp_lib_experimental_parallel_algorithm` for the functionality of the Parallelism TS as directed by SG1. +# Notes -## Editorial Changes - -* Promoted subsection 1.3.1, which was incorrectly grouped under section 1.3, to section 1.4. +* The pre-existing content of N4698 has not yet been harmonized with C++17. As a result, this content is named and namespaced inconsistently with the newly applied content of P0076R4. We anticipate that these inconsistencies will be harmonized by a future revision. +* N4698 contains forward references to `for_loop` and `for_loop_strided`. We anticipate their introduction in a future revision. diff --git a/algorithms.html b/algorithms.html index 0a62818..17ec63d 100644 --- a/algorithms.html +++ b/algorithms.html @@ -88,6 +88,37 @@

Effect of execution policies on algorithm execution

incremented correctly. + +

+ The invocations of element access functions in parallel algorithms invoked with an + execution policy of type unsequenced_policy are permitted to execute + in an unordered fashion in the calling thread, unsequenced with respect to one another + within the calling thread. + + + This means that multiple function object invocations may be interleaved on a single thread. + +

+
+ + + This overrides the usual guarantee from the C++ standard, Section 1.9 [intro.execution] that + function executions do not interleave with one another. + +

+
+ + +

+ The invocations of element access functions in parallel algorithms invoked with an + executino policy of type vector_policy are permitted to execute + in an unordered fashion in the calling thread, unsequenced with respect to one another + within the calling thread, subject to the sequencing constraints of wavefront application + () for the last argument to + for_loop or for_loop_strided. +

+
+

The invocations of element access functions in parallel algorithms invoked with an execution policy of type parallel_vector_execution_policy @@ -163,6 +194,107 @@

Effect of execution policies on algorithm execution

+ +

Wavefront Application

+ +

+ For the purposes of this section, an evaluation is a value computation or side effect of + an expression, or an execution of a statement. Initialization of a temporary object is considered a + subexpression of the expression that necessitates the temporary object. +

+ +

+ An evaluation A contains an evaluation B if: + +

+ + This includes evaluations occurring in function invocations. +

+ +

+ An evaluation A is ordered before an evaluation B if A is deterministically + sequenced before B. If A is indeterminately sequenced with respect to B + or A and B are unsequenced, then A is not ordered before B and B is not ordered + before A. The ordered before relationship is transitive. +

+ +

+ For an evaluation A ordered before an evaluation B, both contained in the same + invocation of an element access function, A is a vertical antecedent of B if: + +

+ + + Vertical antecedent is an irreflexive, antisymmetric, nontransitive relationship between two evaluations. + Informally, A is a vertical antecedent of B if A is sequenced immediately before B or A is nested zero or + more levels within a statement S that immediately precedes B. + +

+ +

+ In the following, Xi and Xj refer to evaluations of the same expression + or statement contained in the application of an element access function corresponding to the ith and + jth elements of the input sequence. There might be several evaluations Xk, + Yk, etc. of a single expression or statement in application k, for example, if the + expression or statement appears in a loop within the element access function. +

+ +

+ Horizontally matched is an equivalence relationship between two evaluations of the same expression. An + evaluation Bi is horizontally matched with an evaluation Bj if: + +

+ + + Horizontally matched establishes a theoretical lock-step relationship between evaluations in different applications of an element access function. + +

+ +

+ Let f be a function called for each argument list in a sequence of argument lists. + Wavefront application of f requires that evaluation Ai be sequenced + before evaluation Bi if i < j and and: + +

+ + + Wavefront application guarantees that parallel applications i and j execute such that progress on application j never gets ahead of application i. + + + + The relationships between Ai and Bi and between Aj and Bj are sequenced before, not vertical antecedent. + +

+
+
+

ExecutionPolicy algorithm overloads

@@ -365,7 +497,7 @@

Header <experimental/algorithm> synopsis

namespace std { namespace experimental { namespace parallel { -inline namespace v1 { +inline namespace v2 { template<class ExecutionPolicy, class InputIterator, class Function> void for_each(ExecutionPolicy&& exec, @@ -379,6 +511,20 @@

Header <experimental/algorithm> synopsis

InputIterator for_each_n(ExecutionPolicy&& exec, InputIterator first, Size n, Function f); + +namespace execution { + + template<class F> + auto no_vec(F&& f) noexcept -> decltype(std::forward<F>(f)()); + + + template<class T> + class ordered_update_t; + + + template<class T> + ordered_update_t<T> ordered_update(T& ref) noexcept; +} } } } @@ -487,6 +633,143 @@

For each

+ + +

No vec

+ + + + template<class F> +auto no_vec(F&& f) noexcept -> decltype(std::forward<F>(f)()); + + + Evaluates std::forward>F<(f)(). When invoked within an element access function + in a parallel algorithm using vector_policy, if two calls to no_vec are + horizontally matched within a wavefront application of an element access function over input + sequence S, then the execution of f in the application for one element in S is + sequenced before the execution of f in the application for a subsequent element in + S; otherwise, there is no effect on sequencing. + + + + the result of f. + + + + If f returns a result, the result is ignored. + + + + If f exits via an exception, then terminate will be called, consistent + with all other potentially-throwing operations invoked with vector_policy execution. + + +
extern int* p;
+for_loop(vec, 0, n[&](int i) {
+  y[i] +=y[i+1];
+  if(y[i] < 0) {
+    no_vec([]{
+      *p++ = i;
+    });
+  }
+});
+ + The updates *p++ = i will occur in the same order as if the policy were seq. +
+
+
+
+
+ + +

Ordered update class

+ + +
+class ordered_update_t {
+  T& ref_; // exposition only
+public:
+  ordered_update_t(T& loc) noexcept
+    : ref_(loc) {}
+  ordered_update_t(const ordered_update_t&) = delete;
+  ordered_update_t& operator=(const ordered_update_t&) = delete;
+
+  template <class U>
+    auto operator=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ = std::move(rhs); }); }
+  template <class U>
+    auto operator+=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ += std::move(rhs); }); }
+  template <class U>
+    auto operator-=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ -= std::move(rhs); }); }
+  template <class U>
+    auto operator*=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ *= std::move(rhs); }); }
+  template <class U>
+    auto operator/=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ /= std::move(rhs); }); }
+  template <class U>
+    auto operator%=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ %= std::move(rhs); }); }
+  template <class U>
+    auto operator>>=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ >>= std::move(rhs); }); }
+  template <class U>
+    auto operator<<=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ <<= std::move(rhs); }); }
+  template <class U>
+    auto operator&=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ &= std::move(rhs); }); }
+  template <class U>
+    auto operator^=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ ^= std::move(rhs); }); }
+  template <class U>
+    auto operator|=(U rhs) const noexcept
+      { return no_vec([&]{ return ref_ |= std::move(rhs); }); }
+
+  auto operator++() const noexcept
+    { return no_vec([&]{ return ++ref_; }); }
+  auto operator++(int) const noexcept
+    { return no_vec([&]{ return ref_++; }); }
+  auto operator--() const noexcept
+    { return no_vec([&]{ return --ref_; }); }
+  auto operator--(int) const noexcept
+    { return no_vec([&]{ return ref_--; }); }
+};
+
+ +

+ An object of type ordered_update_t>T< is a proxy for an object of type T + intended to be used within a parallel application of an element access function using a + policy object of type vector_policy. Simple increments, assignments, and compound + assignments to the object are forwarded to the proxied object, but are sequenced as though + executed within a no_vec invocation. + + + The return-value deduction of the forwarded operations results in these operations returning by + value, not reference. This formulation prevents accidental collisions on accesses to the return + value. + +

+
+
+ + +

Ordered update function template

+ + + + template<T> +ordered_update_t<T> ordered_update(T& loc) noexcept; + + + + { loc }. + + + +
@@ -499,7 +782,7 @@

Header <experimental/numeric> synopsis

namespace std { namespace experimental { namespace parallel { -inline namespace v1 { +inline namespace v2 { template<class InputIterator> typename iterator_traits<InputIterator>::value_type reduce(InputIterator first, InputIterator last); @@ -772,7 +1055,7 @@

Inclusive scan

OutputIterator inclusive_scan(InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op); - template<class InputIterator, class OutputIterator, class BinaryOperation> + template<class InputIterator, class OutputIterator, class BinaryOperation, class T> OutputIterator inclusive_scan(InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op, T init); diff --git a/exceptions.html b/exceptions.html index 4c07716..d32baef 100644 --- a/exceptions.html +++ b/exceptions.html @@ -9,40 +9,37 @@

Exception reporting behavior

During the execution of a standard parallel algorithm, if the invocation of an element access function - exits viaterminates with an uncaught exception, the behavior of the program is determined by the type of + exits via an uncaught exception, the behavior of the program is determined by the type of execution policy used to invoke the algorithm:

These functions are herein called element access functions. @@ -1361,11 +1410,10 @@

Contents

-
1.5 [parallel.general.features]
+
1.5

Feature-testing recommendations

[parallel.general.features]
- -

Feature-testing recommendations

-

An implementation that provides support for this Technical Specification shall define the feature test macro(s) in Table 1.

+ +

An implementation that provides support for this Technical Specification shall define the feature test macro(s) in Table 1.

@@ -1390,10 +1438,16 @@

Feature-testing recommendations

<experimental/numeric> + + + + +
__cpp_lib_experimental_parallel_task_block201510 + <experimental/task_block>
+
-
@@ -1472,14 +1526,14 @@

Feature-testing recommendations

-
2.2

Header <experimental/execution_policy> synopsis

[parallel.execpol.synopsis]
+
2.2

Header <experimental/execution_policy> synopsis

[parallel.execpol.synopsis]
namespace std {
 namespace experimental {
 namespace parallel {
-inline namespace v1 {
+inline namespace v2v1 {
   // 2.3, Execution policy type trait
   template<class T> struct is_execution_policy;
   template<class T> constexpr bool is_execution_policy_v = is_execution_policy<T>::value;
@@ -1514,7 +1568,10 @@ 

Feature-testing recommendations

template<class T> struct is_execution_policy { see below };
 
-

is_execution_policy can be used to detect parallel execution policies for the purpose of excluding function signatures from otherwise ambiguous overload resolution participation.

+

is_execution_policy + can be used to detect parallel execution policies for the purpose of +excluding function signatures from otherwise ambiguous overload +resolution participation.

is_execution_policy<T> shall be a UnaryTypeTrait with a BaseCharacteristic of true_type if T is the type of a standard or implementation-defined execution policy, otherwise false_type. @@ -1543,7 +1600,10 @@

Feature-testing recommendations

class sequential_execution_policy{ unspecified };
 
-

The class sequential_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm's execution may not be parallelized.

+

The class sequential_execution_policy + is an execution policy type used as a unique type to disambiguate +parallel algorithm overloading and require that a parallel algorithm's +execution may not be parallelized.

@@ -1559,7 +1619,10 @@

Feature-testing recommendations

class parallel_execution_policy{ unspecified };
 
-

The class parallel_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized.

+

The class parallel_execution_policy + is an execution policy type used as a unique type to disambiguate +parallel algorithm overloading and indicate that a parallel algorithm's +execution may be parallelized.

@@ -1575,7 +1638,10 @@

Feature-testing recommendations

class parallel_vector_execution_policy{ unspecified };
 
-

The class class parallel_vector_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized and parallelized.

+

The class class parallel_vector_execution_policy + is an execution policy type used as a unique type to disambiguate +parallel algorithm overloading and indicate that a parallel algorithm's +execution may be vectorized and parallelized.

@@ -1781,7 +1847,7 @@

Feature-testing recommendations

During the execution of a standard parallel algorithm, if the invocation of an element access function - exits viaterminates with an uncaught exception, the behavior of the program is determined by the type of + exits via an uncaught exception, the behavior of the program is determined by the type of execution policy used to invoke the algorithm: