Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add fold_node RFC #1526

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

kboyarinov
Copy link
Contributor

Description

Add RFC document for new fold_node in the Flow Graph.

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

Copy link
Contributor

@akukanov akukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several other design questions which I think need to be discussed. I split those into separate comments to organize discussions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, since the node is supposed to handle multiple input streams, we need to decide how similar these streams must be. Should the same initial value be used, or should it be per stream? Should the same binary operation be used, or can/should it be separate for each stream? Should the types of input and output be the same for all streams, or can those differ as well?

Of course I see that in the proposal all those things are the same for all streams. But it is not discussed, rather assumed as given.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cases should help in answering this question. Probably, the same type and same initializer value will make sense for most use cases, but we should add support for this choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related question is on the number of input and output ports. Do we expect all streams to go into a single port, or should there be a single port per stream, or even many-to-many? Of course we can add a preceding indexer_node for the port-per-stream setup in front of a single input port (and it would also nicely create tagged_msges). But if the 1-1 correspondence is expected to be the typical use, maybe we would not want to complicate it. Also there is no "deindexer" node to put after a single output port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting point. We have a split_node that is complementary to join_node, joining and splitting tuples. But nothing that is complementary to an indexer_node. The indexer_node was introduced to tagged messages that came into a single point of a functional_node. So far, I don't think we've seen the need to automatically send indexed messages to different output ports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we see the case of folding a single stream to be important enough, and would it be sufficiently easy to do with the node designed to handle multiple streams? Perhaps some code samples could show that.

Comment on lines +40 to +41
as an input message together with the actual data to compute. It is proposed to use ``tbb::flow::tagged_msg`` class for that purpose
as an input for the ``fold_node`` that would be sent by the predecessor or by manual ``try_put`` calls:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tagged_msg was designed to hold data of multiple types, it's a 'variant' on steroids. If we decide to require a single data type for all streams, tagged_msg will likely be not that convenient to use and may also add some overheads. In other words, I think the way to combine the data with the stream index will depend on the overall node design.

Comment on lines +105 to +106
From the implementation perspective, there can be multiple approaches for implementing the ``fold_stream_end`` that affects the user API.
Some of them were considered in a [separate section](#fold_stream_end-implementation-approaches).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that all the approaches are based on a message counter. The question then is, at which point the total number of messages to be folded (the stream "size") becomes known. If it is known a priori, or if we want to compute partial folds for each K messages, the counter might be set even at construction. If the stream size is dynamic but known before the first message, we may require that the stream is somehow "initialized" to set that counter.

The most flexible, but also the most complicated and risky approach is to get the stream size at any point, or to get the end of stream message. If the message does not carry any counter, there are problems with potential "still in the flight" messages that might be missed. If the message does carry a counter, what to do if more messages were already received and folded? Also, the message type is kind of problematic, as both the stream size and the end of stream message are likely of different type than the stream data.

A possible design alternative is to have a dedicated port for the "end of stream" signals.

Comment on lines +125 to +126
In this case, having the internal counter not equal to _0_ indicates that there are still elements to process and once it is equal to _0_, the result
of the fold is considered full and can be propagated to the ``fold_node`` successors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the single counter for both expected (negative count) and received (positive count) messages, which triggers at 0, has a downside - it loses the information about the number of messages and cannot pass it down. This value was computed as a byproduct of folding, and in theory it might not be known anywhere else, so it could possibly be useful in the subsequent processing.

The same flexible approach can be applied for solving this as well:
* If the invocation of _fold operation_ with an object of type

It is also an open question, should ``input_type`` be allowed as a first, second, or both operands of _fold operation_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, there can be a separate optional argument for the stream index.

is well-formed - propagate the index of the stream to the body.
* Otherwise - the index of the input stream is not propagated to the _fold operation_.

### Should ``InputType`` and ``OutputType`` be different?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes, generally these should be different types. If I remember correctly, one of our motivating scenarios is computing a histogram for a data sequence; it is obviously of a different type than the data elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important to outline some use cases which motivate creation of the fold node. It will guide the design and help to prevent "suboptimal" decisions.

A few simplified use case ideas to consider:

  • compute the word/token count in a text file read line by line.
  • compute the sum of floating-point values, assuming that the sum might be many orders of magnitude bigger than some individual values.
  • "reduce by key": for a key-value sequence, compute the value sums for all different keys.


// Submit stream1
for (auto item : stream1) {
f_node.try_put(input_type{stream1_index, item});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f_node.try_put(input_type{stream1_index, item});
f_node.try_put(fold_input{stream1_index, item});

}

for (auto item : stream2) {
f_node.try_put(input_type{stream2_index, item});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f_node.try_put(input_type{stream2_index, item});
f_node.try_put(fold_input{stream2_index, item});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cases should help in answering this question. Probably, the same type and same initializer value will make sense for most use cases, but we should add support for this choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting point. We have a split_node that is complementary to join_node, joining and splitting tuples. But nothing that is complementary to an indexer_node. The indexer_node was introduced to tagged messages that came into a single point of a functional_node. So far, I don't think we've seen the need to automatically send indexed messages to different output ports.

@vossmjp vossmjp added the RFC label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants