![]() |
ACloudViewer
3.9.4
A Modern Library for 3D Data Processing
|
#include <Indexer.h>

Public Member Functions | |
| Indexer () | |
| Indexer (const Indexer &)=default | |
| Indexer & | operator= (const Indexer &)=default |
| Indexer (const std::vector< Tensor > &input_tensors, const Tensor &output_tensor, DtypePolicy dtype_policy=DtypePolicy::ALL_SAME, const SizeVector &reduction_dims={}) | |
| Indexer (const std::vector< Tensor > &input_tensors, const std::vector< Tensor > &output_tensors, DtypePolicy dtype_policy=DtypePolicy::ALL_SAME, const SizeVector &reduction_dims={}) | |
| bool | CanUse32BitIndexing () const |
| Returns true iff the maximum_offsets in bytes are smaller than 2^31 - 1. More... | |
| IndexerIterator | SplitTo32BitIndexing () const |
| std::unique_ptr< Indexer > | SplitLargestDim () |
| Indexer | GetPerOutputIndexer (int64_t output_idx) const |
| bool | ShouldAccumulate () const |
| bool | IsFinalOutput () const |
| void | ShrinkDim (int64_t dim, int64_t start, int64_t size) |
| int64_t | NumReductionDims () const |
| Returns the number of reduction dimensions. More... | |
| int64_t | NumDims () const |
| Returns number of dimensions of the Indexer. More... | |
| const int64_t * | GetPrimaryShape () const |
| int64_t * | GetPrimaryShape () |
| const int64_t * | GetPrimaryStrides () const |
| int64_t | NumWorkloads () const |
| int64_t | NumOutputElements () const |
| Returns the number of output elements. More... | |
| int64_t | NumInputs () const |
| Number of input Tensors. More... | |
| int64_t | NumOutputs () const |
| Number of output Tensors. More... | |
| TensorRef & | GetInput (int64_t i) |
| Returns input TensorRef. More... | |
| const TensorRef & | GetInput (int64_t i) const |
| TensorRef & | GetOutput (int64_t i) |
| Returns output TensorRef. More... | |
| const TensorRef & | GetOutput (int64_t i) const |
| TensorRef & | GetOutput () |
| const TensorRef & | GetOutput () const |
| bool | IsReductionDim (int64_t dim) const |
Returns true if the dim -th dimension is reduced. More... | |
| CLOUDVIEWER_HOST_DEVICE char * | GetInputPtr (int64_t input_idx, int64_t workload_idx) const |
| template<typename T > | |
| CLOUDVIEWER_HOST_DEVICE T * | GetInputPtr (int64_t input_idx, int64_t workload_idx) const |
| CLOUDVIEWER_HOST_DEVICE char * | GetOutputPtr (int64_t workload_idx) const |
| template<typename T > | |
| CLOUDVIEWER_HOST_DEVICE T * | GetOutputPtr (int64_t workload_idx) const |
| CLOUDVIEWER_HOST_DEVICE char * | GetOutputPtr (int64_t output_idx, int64_t workload_idx) const |
| template<typename T > | |
| CLOUDVIEWER_HOST_DEVICE T * | GetOutputPtr (int64_t output_idx, int64_t workload_idx) const |
Protected Member Functions | |
| void | CoalesceDimensions () |
| void | ReorderDimensions (const SizeVector &reduction_dims) |
| void | UpdatePrimaryStrides () |
| Update primary_strides_ based on primary_shape_. More... | |
| void | UpdateContiguousFlags () |
| Update input_contiguous_ and output_contiguous_. More... | |
| CLOUDVIEWER_HOST_DEVICE char * | GetWorkloadDataPtr (const TensorRef &tr, bool tr_contiguous, int64_t workload_idx) const |
| template<typename T > | |
| CLOUDVIEWER_HOST_DEVICE T * | GetWorkloadDataPtr (const TensorRef &tr, bool tr_contiguous, int64_t workload_idx) const |
Static Protected Member Functions | |
| static void | BroadcastRestride (TensorRef &src, int64_t dst_ndims, const int64_t *dst_shape) |
| static void | ReductionRestride (TensorRef &dst, int64_t src_ndims, const int64_t *src_shape, const SizeVector &reduction_dims) |
Protected Attributes | |
| int64_t | num_inputs_ = 0 |
| Number of input and output Tensors. More... | |
| int64_t | num_outputs_ = 0 |
| TensorRef | inputs_ [MAX_INPUTS] |
| Array of input TensorRefs. More... | |
| TensorRef | outputs_ [MAX_OUTPUTS] |
| Array of output TensorRefs. More... | |
| bool | inputs_contiguous_ [MAX_INPUTS] |
| Array of contiguous flags for all input TensorRefs. More... | |
| bool | outputs_contiguous_ [MAX_OUTPUTS] |
| Array of contiguous flags for all output TensorRefs. More... | |
| int64_t | primary_shape_ [MAX_DIMS] |
| int64_t | primary_strides_ [MAX_DIMS] |
| int64_t | ndims_ = 0 |
| Indexer's global number of dimensions. More... | |
| bool | final_output_ = true |
| bool | accumulate_ = false |
Indexing engine for elementwise ops with broadcasting support.
Fancy indexing is supported by restriding input tensor and treating the operation as elementwise op.
After constructing Indexer on the host, the indexing methods can be used from both host and device.
|
inline |
Definition at line 264 of file Indexer.h.
Referenced by SplitLargestDim().
|
default |
| cloudViewer::core::Indexer::Indexer | ( | const std::vector< Tensor > & | input_tensors, |
| const Tensor & | output_tensor, | ||
| DtypePolicy | dtype_policy = DtypePolicy::ALL_SAME, |
||
| const SizeVector & | reduction_dims = {} |
||
| ) |
Only single output is supported for simplicity. To extend this function to support multiple outputs, one may check for shape compatibility of all outputs.
Definition at line 35 of file Indexer.cpp.
| cloudViewer::core::Indexer::Indexer | ( | const std::vector< Tensor > & | input_tensors, |
| const std::vector< Tensor > & | output_tensors, | ||
| DtypePolicy | dtype_policy = DtypePolicy::ALL_SAME, |
||
| const SizeVector & | reduction_dims = {} |
||
| ) |
Definition at line 44 of file Indexer.cpp.
References cloudViewer::core::ALL_SAME, cloudViewer::core::Bool, BroadcastRestride(), CoalesceDimensions(), cloudViewer::core::INPUT_SAME, cloudViewer::core::INPUT_SAME_OUTPUT_BOOL, inputs_, LogError, cloudViewer::core::MAX_INPUTS, cloudViewer::core::MAX_OUTPUTS, cloudViewer::core::TensorRef::ndims_, ndims_, cloudViewer::core::NONE, num_inputs_, num_outputs_, outputs_, primary_shape_, ReductionRestride(), cloudViewer::core::shape_util::ReductionShape(), ReorderDimensions(), cloudViewer::core::TensorRef::shape_, cloudViewer::core::SmallVectorBase< Size_T >::size(), cloudViewer::core::Dtype::ToString(), UpdateContiguousFlags(), and UpdatePrimaryStrides().
|
staticprotected |
Broadcast src to dst by setting shape 1 to omitted dimensions and setting stride 0 to brocasted dimensions.
Note that other approaches may also work. E.g. one could set src's shape to exactly the same as dst's shape. In general, if a dimension is of size 1, the stride have no effect in computing offsets; or likewise if a dimension has stride 0, the shape have no effect in computing offsets.
[Before] Omitted | Broadcast | | No broadcast | | | V V V src.shape_: [ 2, 1, 1, 3] src.strides_: [ 3, 3, 3, 1] dst.shape_: [ 2, 2, 2, 1, 3] dst.strides_: [12, 6, 3, 3, 1]
[After] src.shape_: [ 1, 2, 1, 1, 3] src.strides_: [ 0, 3, 0, 3, 1]
| src | The source TensorRef to be broadcasted. |
| dst_ndims | Number of dimensions to be broadcasted to. |
| dst_shape | Shape to be broadcasted to. |
Definition at line 575 of file Indexer.cpp.
References cloudViewer::core::TensorRef::byte_strides_, cloudViewer::core::TensorRef::ndims_, and cloudViewer::core::TensorRef::shape_.
Referenced by Indexer().
| bool cloudViewer::core::Indexer::CanUse32BitIndexing | ( | ) | const |
Returns true iff the maximum_offsets in bytes are smaller than 2^31 - 1.
Definition at line 198 of file Indexer.cpp.
References inputs_, max(), ndims_, num_inputs_, num_outputs_, NumWorkloads(), outputs_, and primary_shape_.
|
protected |
Merge adjacent dimensions if either dim is 1 or if: shape[n] * stride[n] == shape[n + 1]
Definition at line 425 of file Indexer.cpp.
References cloudViewer::core::TensorRef::byte_strides_, inputs_, cloudViewer::core::TensorRef::ndims_, ndims_, num_inputs_, num_outputs_, outputs_, primary_shape_, stride, UpdateContiguousFlags(), and UpdatePrimaryStrides().
Referenced by Indexer(), and ShrinkDim().
|
inline |
|
inline |
Definition at line 352 of file Indexer.h.
References inputs_, LogError, and num_inputs_.
|
inline |
Get input Tensor data pointer based on workload_idx.
| input_idx | Input tensor index. |
| workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
Definition at line 406 of file Indexer.h.
References GetWorkloadDataPtr(), inputs_, inputs_contiguous_, and num_inputs_.
Referenced by cloudViewer::core::AdvancedIndexer::GetIndexedOffset(), cloudViewer::core::AdvancedIndexer::GetInputPtr(), and cloudViewer::core::kernel::CPUArgReductionEngine::LaunchArgReductionParallelDim().
|
inline |
Get input Tensor data pointer based on workload_idx.
| input_idx | Input tensor index. |
| workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
Note: Assumes that sizeof(T) matches the input's dtype size, but does not check this constraint for performance reasons.
Definition at line 424 of file Indexer.h.
References inputs_, inputs_contiguous_, and num_inputs_.
|
inline |
Returns output TensorRef. Only works if there's only one output. Equivalent to GetOutput(0).
Definition at line 378 of file Indexer.h.
References LogError, and num_outputs_.
Referenced by GetOutput().
|
inline |
Definition at line 385 of file Indexer.h.
References GetOutput(), LogError, and num_outputs_.
|
inline |
|
inline |
Definition at line 368 of file Indexer.h.
References LogError, num_outputs_, and outputs_.
|
inline |
Get output Tensor data pointer based on workload_idx.
| output_idx | Output tensor index. |
| workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
Definition at line 461 of file Indexer.h.
References GetWorkloadDataPtr(), outputs_, and outputs_contiguous_.
|
inline |
Get output Tensor data pointer based on workload_idx.
| output_idx | Output tensor index. |
| workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
Definition at line 474 of file Indexer.h.
References outputs_, and outputs_contiguous_.
|
inline |
Get output Tensor data pointer based on workload_idx.
| workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
Definition at line 438 of file Indexer.h.
References GetWorkloadDataPtr(), outputs_, and outputs_contiguous_.
Referenced by cloudViewer::core::AdvancedIndexer::GetOutputPtr(), and cloudViewer::core::kernel::CPUArgReductionEngine::LaunchArgReductionParallelDim().
|
inline |
Get output Tensor data pointer based on workload_idx.
| workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
Note: Assumes that sizeof(T) matches the output's dtype size, but does not check this constraint for performance reasons.
Definition at line 451 of file Indexer.h.
References outputs_, and outputs_contiguous_.
| Indexer cloudViewer::core::Indexer::GetPerOutputIndexer | ( | int64_t | output_idx | ) | const |
Get a sub-indexer that loops through all inputs corresponding to a single output.
Definition at line 303 of file Indexer.cpp.
References cloudViewer::core::TensorRef::byte_strides_, cloudViewer::core::TensorRef::data_ptr_, GetPrimaryShape(), inputs_, IsReductionDim(), cloudViewer::core::MAX_DIMS, ndims_, num_inputs_, num_outputs_, outputs_, primary_shape_, cloudViewer::core::TensorRef::shape_, stride, UpdateContiguousFlags(), and UpdatePrimaryStrides().
|
inline |
Definition at line 317 of file Indexer.h.
References primary_shape_.
|
inline |
Returns Indexer's primary shape, one can iterate the Indexer with this shape.
Definition at line 316 of file Indexer.h.
References primary_shape_.
Referenced by GetPerOutputIndexer().
|
inline |
Returns Indexer's primary strides, one can iterate the Indexer with this strides. It is always set to be the default strides from primary_shape_.
Definition at line 321 of file Indexer.h.
References primary_strides_.
|
inlineprotected |
Get data pointer from a TensorRef with workload_idx. Note: can be optimized by computing all input ptrs and output ptr together.
Definition at line 542 of file Indexer.h.
References cloudViewer::core::TensorRef::byte_strides_, cloudViewer::core::TensorRef::data_ptr_, cloudViewer::core::TensorRef::dtype_byte_size_, ndims_, offset, and primary_strides_.
Referenced by GetInputPtr(), and GetOutputPtr().
|
inlineprotected |
Get data pointer from a TensorRef with workload_idx. Note: can be optimized by computing all input ptrs and output ptr together.
Note: Assumes that sizeof(T) matches the data's dtype size, but does not check this constraint for performance reasons.
Definition at line 572 of file Indexer.h.
References cloudViewer::core::TensorRef::byte_strides_, cloudViewer::core::TensorRef::data_ptr_, ndims_, offset, and primary_strides_.
|
inline |
Definition at line 299 of file Indexer.h.
References final_output_.
|
inline |
Returns true if the dim -th dimension is reduced.
Definition at line 394 of file Indexer.h.
References cloudViewer::core::TensorRef::byte_strides_, outputs_, and primary_shape_.
Referenced by GetPerOutputIndexer(), and SplitLargestDim().
|
inline |
|
inline |
| int64_t cloudViewer::core::Indexer::NumOutputElements | ( | ) | const |
Returns the number of output elements.
Definition at line 414 of file Indexer.cpp.
References ndims_, outputs_, and primary_shape_.
Referenced by cloudViewer::core::kernel::CPUReductionEngine::Run(), and cloudViewer::core::kernel::CPUArgReductionEngine::Run().
|
inline |
| int64_t cloudViewer::core::Indexer::NumReductionDims | ( | ) | const |
Returns the number of reduction dimensions.
Definition at line 395 of file Indexer.cpp.
| int64_t cloudViewer::core::Indexer::NumWorkloads | ( | ) | const |
Returns the total number of workloads (e.g. computations) needed for the op. The scheduler schedules these workloads to run on parallel threads.
For non-reduction ops, NumWorkloads() is the same as number of output elements (e.g. for broadcasting ops).
For reduction ops, NumWorkLoads() is the same as the number of input elements. Currently we don't allow mixing broadcasting and reduction in one op kernel.
Definition at line 406 of file Indexer.cpp.
References ndims_, and primary_shape_.
Referenced by CanUse32BitIndexing(), cloudViewer::core::kernel::CPUArgReductionEngine::LaunchArgReductionParallelDim(), cloudViewer::core::AdvancedIndexer::NumWorkloads(), and cloudViewer::core::ParallelForSYCL().
|
staticprotected |
Symmetrical to BroadcastRestride. Set the reduced dimensions' stride to 0 at output. Currently only support the keepdim=true case.
Definition at line 602 of file Indexer.cpp.
References cloudViewer::core::TensorRef::byte_strides_, LogError, cloudViewer::core::TensorRef::ndims_, and cloudViewer::core::TensorRef::shape_.
Referenced by Indexer().
|
protected |
Definition at line 491 of file Indexer.cpp.
References cloudViewer::core::TensorRef::byte_strides_, inputs_, ndims_, num_inputs_, num_outputs_, outputs_, cloudViewer::core::TensorRef::Permute(), cloudViewer::core::SmallVectorTemplateCommon< T, typename >::rbegin(), cloudViewer::core::SmallVectorTemplateCommon< T, typename >::rend(), and std::swap().
Referenced by Indexer().
|
inline |
Definition at line 297 of file Indexer.h.
References accumulate_.
| void cloudViewer::core::Indexer::ShrinkDim | ( | int64_t | dim, |
| int64_t | start, | ||
| int64_t | size | ||
| ) |
Shrink iteration to a specific range in a specific dimension.
| dim | The dimension to be shrunken to. |
| start | Starting index (inclusive) for dimension dim. No dimension wrapping is available. |
| size | The size to iterate in dimension dim. |
Definition at line 364 of file Indexer.cpp.
References CoalesceDimensions(), cloudViewer::core::TensorRef::data_ptr_, inputs_, LogError, ndims_, num_inputs_, num_outputs_, outputs_, primary_shape_, size, UpdateContiguousFlags(), and UpdatePrimaryStrides().
Referenced by SplitLargestDim().
| std::unique_ptr< Indexer > cloudViewer::core::Indexer::SplitLargestDim | ( | ) |
Split the indexer such that the largest-span-dimension is split into two halves. The returned new indexer iterates the first half while the current indexer iterates the second half.
Definition at line 238 of file Indexer.cpp.
References accumulate_, copy, Indexer(), inputs_, IsReductionDim(), LogError, ndims_, num_inputs_, num_outputs_, outputs_, primary_shape_, ShrinkDim(), and size.
| IndexerIterator cloudViewer::core::Indexer::SplitTo32BitIndexing | ( | ) | const |
Returns an iterator of Indexers, each of which can be indexed in 32 bits.
Definition at line 234 of file Indexer.cpp.
|
protected |
Update input_contiguous_ and output_contiguous_.
Definition at line 565 of file Indexer.cpp.
References inputs_, inputs_contiguous_, cloudViewer::core::TensorRef::IsContiguous(), num_inputs_, num_outputs_, outputs_, and outputs_contiguous_.
Referenced by CoalesceDimensions(), GetPerOutputIndexer(), Indexer(), and ShrinkDim().
|
protected |
Update primary_strides_ based on primary_shape_.
Definition at line 556 of file Indexer.cpp.
References ndims_, primary_shape_, primary_strides_, and stride.
Referenced by CoalesceDimensions(), GetPerOutputIndexer(), Indexer(), and ShrinkDim().
|
protected |
If the kernel should accumulate into the output. Only relevant for CUDA reductions.
Definition at line 637 of file Indexer.h.
Referenced by ShouldAccumulate(), and SplitLargestDim().
|
protected |
Whether this iterator produces the actual output, as opposed to something that will be accumulated further. Only relevant for CUDA reductions.
Definition at line 633 of file Indexer.h.
Referenced by IsFinalOutput().
|
protected |
Array of input TensorRefs.
Definition at line 599 of file Indexer.h.
Referenced by CanUse32BitIndexing(), CoalesceDimensions(), GetInput(), GetInputPtr(), GetPerOutputIndexer(), Indexer(), ReorderDimensions(), ShrinkDim(), SplitLargestDim(), and UpdateContiguousFlags().
|
protected |
Array of contiguous flags for all input TensorRefs.
Definition at line 605 of file Indexer.h.
Referenced by GetInputPtr(), and UpdateContiguousFlags().
|
protected |
Indexer's global number of dimensions.
Definition at line 628 of file Indexer.h.
Referenced by CanUse32BitIndexing(), CoalesceDimensions(), GetPerOutputIndexer(), GetWorkloadDataPtr(), Indexer(), NumDims(), NumOutputElements(), NumReductionDims(), NumWorkloads(), ReorderDimensions(), ShrinkDim(), SplitLargestDim(), and UpdatePrimaryStrides().
|
protected |
Number of input and output Tensors.
Definition at line 595 of file Indexer.h.
Referenced by CanUse32BitIndexing(), CoalesceDimensions(), GetInput(), GetInputPtr(), GetPerOutputIndexer(), Indexer(), NumInputs(), ReorderDimensions(), ShrinkDim(), SplitLargestDim(), and UpdateContiguousFlags().
|
protected |
Definition at line 596 of file Indexer.h.
Referenced by CanUse32BitIndexing(), CoalesceDimensions(), GetOutput(), GetPerOutputIndexer(), Indexer(), NumOutputs(), ReorderDimensions(), ShrinkDim(), SplitLargestDim(), and UpdateContiguousFlags().
|
protected |
Array of output TensorRefs.
Definition at line 602 of file Indexer.h.
Referenced by CanUse32BitIndexing(), CoalesceDimensions(), GetOutput(), GetOutputPtr(), GetPerOutputIndexer(), Indexer(), IsReductionDim(), NumOutputElements(), NumReductionDims(), ReorderDimensions(), ShrinkDim(), SplitLargestDim(), and UpdateContiguousFlags().
|
protected |
Array of contiguous flags for all output TensorRefs.
Definition at line 608 of file Indexer.h.
Referenced by GetOutputPtr(), and UpdateContiguousFlags().
|
protected |
Indexer's global shape. The shape's number of elements is the same as GetNumWorkloads() for the Indexer.
Definition at line 621 of file Indexer.h.
Referenced by CanUse32BitIndexing(), CoalesceDimensions(), GetPerOutputIndexer(), GetPrimaryShape(), Indexer(), IsReductionDim(), NumOutputElements(), NumWorkloads(), ShrinkDim(), SplitLargestDim(), and UpdatePrimaryStrides().
|
protected |
The default strides for primary_shape_ for internal use only. Used to compute the actual strides and ultimately the index offsets.
Definition at line 625 of file Indexer.h.
Referenced by GetPrimaryStrides(), GetWorkloadDataPtr(), and UpdatePrimaryStrides().