Using Task.Yield to overcome ThreadPool starvation while implementing producer/consumer pattern

Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:

 CancellationTokenSource cts;
 void Start()
 
 cts = new CancellationTokenSource();

 // run async operation
 var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
 // wait for completion
 // after the completion handle the result/ cancellation/ errors
 

 async Task<int> SomeWork(CancellationToken cancellationToken)
 
 int result = 0;

 bool loopAgain = true;
 while (loopAgain)
 
 // do something ... means a substantial work or a micro batch here - not processing a single byte

 loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
 if (loopAgain) 
 // reschedule the task to the threadpool and free this thread for other waiting tasks
 await Task.Yield();
 
 
 cancellationToken.ThrowIfCancellationRequested();
 return result;
 

 void Cancel()
 
 // request cancelation
 cts.Cancel();

But one user wrote

I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.

Anybody knows, why is not a good idea?

edited Nov 13 '18 at 7:58

asked Nov 12 '18 at 13:30

Maxim T

1047

2

I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.

– Lasse Vågsæther Karlsen
Nov 12 '18 at 13:33

3

I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios

– Marc Gravell♦
Nov 12 '18 at 13:40

1

@MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…

– Marc Gravell♦
Nov 12 '18 at 16:20

2

It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().

– Hans Passant
Nov 12 '18 at 16:22

1

Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.

– Hans Passant
Nov 13 '18 at 13:44

|
show 10 more comments

Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:

 CancellationTokenSource cts;
 void Start()
 
 cts = new CancellationTokenSource();

 // run async operation
 var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
 // wait for completion
 // after the completion handle the result/ cancellation/ errors
 

 async Task<int> SomeWork(CancellationToken cancellationToken)
 
 int result = 0;

 bool loopAgain = true;
 while (loopAgain)
 
 // do something ... means a substantial work or a micro batch here - not processing a single byte

 loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
 if (loopAgain) 
 // reschedule the task to the threadpool and free this thread for other waiting tasks
 await Task.Yield();
 
 
 cancellationToken.ThrowIfCancellationRequested();
 return result;
 

 void Cancel()
 
 // request cancelation
 cts.Cancel();

But one user wrote

I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.

Anybody knows, why is not a good idea?

edited Nov 13 '18 at 7:58

asked Nov 12 '18 at 13:30

Maxim T

1047

2

I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.

– Lasse Vågsæther Karlsen
Nov 12 '18 at 13:33

3

I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios

– Marc Gravell♦
Nov 12 '18 at 13:40

1

@MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…

– Marc Gravell♦
Nov 12 '18 at 16:20

2

It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().

– Hans Passant
Nov 12 '18 at 16:22

1

Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.

– Hans Passant
Nov 13 '18 at 13:44

|
show 10 more comments

Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:

 CancellationTokenSource cts;
 void Start()
 
 cts = new CancellationTokenSource();

 // run async operation
 var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
 // wait for completion
 // after the completion handle the result/ cancellation/ errors
 

 async Task<int> SomeWork(CancellationToken cancellationToken)
 
 int result = 0;

 bool loopAgain = true;
 while (loopAgain)
 
 // do something ... means a substantial work or a micro batch here - not processing a single byte

 loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
 if (loopAgain) 
 // reschedule the task to the threadpool and free this thread for other waiting tasks
 await Task.Yield();
 
 
 cancellationToken.ThrowIfCancellationRequested();
 return result;
 

 void Cancel()
 
 // request cancelation
 cts.Cancel();

But one user wrote

I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.

Anybody knows, why is not a good idea?

edited Nov 13 '18 at 7:58

asked Nov 12 '18 at 13:30

Maxim T

1047

Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:

 CancellationTokenSource cts;
 void Start()
 
 cts = new CancellationTokenSource();

 // run async operation
 var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
 // wait for completion
 // after the completion handle the result/ cancellation/ errors
 

 async Task<int> SomeWork(CancellationToken cancellationToken)
 
 int result = 0;

 bool loopAgain = true;
 while (loopAgain)
 
 // do something ... means a substantial work or a micro batch here - not processing a single byte

 loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
 if (loopAgain) 
 // reschedule the task to the threadpool and free this thread for other waiting tasks
 await Task.Yield();
 
 
 cancellationToken.ThrowIfCancellationRequested();
 return result;
 

 void Cancel()
 
 // request cancelation
 cts.Cancel();

But one user wrote

I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.

Anybody knows, why is not a good idea?

c# multithreading async-await task-parallel-library threadpool

edited Nov 13 '18 at 7:58

asked Nov 12 '18 at 13:30

Maxim T

1047

edited Nov 13 '18 at 7:58

asked Nov 12 '18 at 13:30

Maxim T

1047

edited Nov 13 '18 at 7:58

asked Nov 12 '18 at 13:30

Maxim T

1047

asked Nov 12 '18 at 13:30

Maxim T

1047

asked Nov 12 '18 at 13:30

Maxim T

1047

2

I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.

– Lasse Vågsæther Karlsen
Nov 12 '18 at 13:33

3

I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios

– Marc Gravell♦
Nov 12 '18 at 13:40

1

@MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…

– Marc Gravell♦
Nov 12 '18 at 16:20

2

It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().

– Hans Passant
Nov 12 '18 at 16:22

1

Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.

– Hans Passant
Nov 13 '18 at 13:44

|
show 10 more comments

2

I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.

– Lasse Vågsæther Karlsen
Nov 12 '18 at 13:33

3

I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios

– Marc Gravell♦
Nov 12 '18 at 13:40

1

@MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…

– Marc Gravell♦
Nov 12 '18 at 16:20

2

It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().

– Hans Passant
Nov 12 '18 at 16:22

1

Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.

– Hans Passant
Nov 13 '18 at 13:44

I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.

– Lasse Vågsæther Karlsen
Nov 12 '18 at 13:33

I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios

– Marc Gravell♦
Nov 12 '18 at 13:40

@MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…

– Marc Gravell♦
Nov 12 '18 at 16:20

It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().

– Hans Passant
Nov 12 '18 at 16:22

Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.

– Hans Passant
Nov 13 '18 at 13:44

|
show 10 more comments

2 Answers
2

active

oldest

votes

There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.

Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).

That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.

Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.

There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).

edited Nov 13 '18 at 2:03

answered Nov 12 '18 at 23:29

noseratio

46.5k14124326

1

i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.

– Maxim T
Nov 13 '18 at 3:49

The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.

– Maxim T
Nov 13 '18 at 3:53

@MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).

– noseratio
Nov 13 '18 at 4:20

I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).

– Maxim T
Nov 13 '18 at 5:04

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.

– Maxim T
Nov 13 '18 at 7:10

|
show 8 more comments

After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.

But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.

Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.

The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).

In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.

For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).

For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.

private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
 
 // SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
 CPU_TASK(message, 50);
 // IO BOUND ASYNCH TASK - used as is
 await Task.Delay(50);
 // BUT WRAP the LONG SYNCHRONOUS TASK inside the Task 
 // which is scheduled on the custom thread pool 
 // (to save threadpool threads)
 await Task.Factory.StartNew(() => 
 CPU_TASK(message, 100000);
 , token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);

edited Nov 14 '18 at 16:14

answered Nov 13 '18 at 8:14

Maxim T

1047

The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…

– Maxim T
Nov 14 '18 at 12:48

It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.

– Maxim T
Nov 17 '18 at 12:54

I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.

– Maxim T
Nov 17 '18 at 18:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263258%2fusing-task-yield-to-overcome-threadpool-starvation-while-implementing-producer-c%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.

edited Nov 13 '18 at 2:03

answered Nov 12 '18 at 23:29

noseratio

46.5k14124326

1

i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.

– Maxim T
Nov 13 '18 at 3:49

The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.

– Maxim T
Nov 13 '18 at 3:53

@MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).

– noseratio
Nov 13 '18 at 4:20

I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).

– Maxim T
Nov 13 '18 at 5:04

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.

– Maxim T
Nov 13 '18 at 7:10

|
show 8 more comments

There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.

edited Nov 13 '18 at 2:03

answered Nov 12 '18 at 23:29

noseratio

46.5k14124326

1

i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.

– Maxim T
Nov 13 '18 at 3:49

The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.

– Maxim T
Nov 13 '18 at 3:53

@MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).

– noseratio
Nov 13 '18 at 4:20

I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).

– Maxim T
Nov 13 '18 at 5:04

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.

– Maxim T
Nov 13 '18 at 7:10

|
show 8 more comments

There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.

edited Nov 13 '18 at 2:03

answered Nov 12 '18 at 23:29

noseratio

46.5k14124326

There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.

edited Nov 13 '18 at 2:03

answered Nov 12 '18 at 23:29

noseratio

46.5k14124326

edited Nov 13 '18 at 2:03

answered Nov 12 '18 at 23:29

noseratio

46.5k14124326

answered Nov 12 '18 at 23:29

noseratio

46.5k14124326

answered Nov 12 '18 at 23:29

noseratio

46.5k14124326

1

i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.

– Maxim T
Nov 13 '18 at 3:49

The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.

– Maxim T
Nov 13 '18 at 3:53

@MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).

– noseratio
Nov 13 '18 at 4:20

I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).

– Maxim T
Nov 13 '18 at 5:04

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.

– Maxim T
Nov 13 '18 at 7:10

|
show 8 more comments

1

i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.

– Maxim T
Nov 13 '18 at 3:49

The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.

– Maxim T
Nov 13 '18 at 3:53

@MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).

– noseratio
Nov 13 '18 at 4:20

I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).

– Maxim T
Nov 13 '18 at 5:04

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.

– Maxim T
Nov 13 '18 at 7:10

i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.

– Maxim T
Nov 13 '18 at 3:49

The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.

– Maxim T
Nov 13 '18 at 3:53

@MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).

– noseratio
Nov 13 '18 at 4:20

I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).

– Maxim T
Nov 13 '18 at 5:04

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.

– Maxim T
Nov 13 '18 at 7:10

|
show 8 more comments

After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).

In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.

For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).

For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.

private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
 
 // SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
 CPU_TASK(message, 50);
 // IO BOUND ASYNCH TASK - used as is
 await Task.Delay(50);
 // BUT WRAP the LONG SYNCHRONOUS TASK inside the Task 
 // which is scheduled on the custom thread pool 
 // (to save threadpool threads)
 await Task.Factory.StartNew(() => 
 CPU_TASK(message, 100000);
 , token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);

edited Nov 14 '18 at 16:14

answered Nov 13 '18 at 8:14

Maxim T

1047

The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…

– Maxim T
Nov 14 '18 at 12:48

It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.

– Maxim T
Nov 17 '18 at 12:54

I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.

– Maxim T
Nov 17 '18 at 18:15

add a comment |

After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).

In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.

For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).

For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.

private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
 
 // SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
 CPU_TASK(message, 50);
 // IO BOUND ASYNCH TASK - used as is
 await Task.Delay(50);
 // BUT WRAP the LONG SYNCHRONOUS TASK inside the Task 
 // which is scheduled on the custom thread pool 
 // (to save threadpool threads)
 await Task.Factory.StartNew(() => 
 CPU_TASK(message, 100000);
 , token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);

edited Nov 14 '18 at 16:14

answered Nov 13 '18 at 8:14

Maxim T

1047

The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…

– Maxim T
Nov 14 '18 at 12:48

It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.

– Maxim T
Nov 17 '18 at 12:54

I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.

– Maxim T
Nov 17 '18 at 18:15

add a comment |

After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).

In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.

For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).

For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.

private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
 
 // SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
 CPU_TASK(message, 50);
 // IO BOUND ASYNCH TASK - used as is
 await Task.Delay(50);
 // BUT WRAP the LONG SYNCHRONOUS TASK inside the Task 
 // which is scheduled on the custom thread pool 
 // (to save threadpool threads)
 await Task.Factory.StartNew(() => 
 CPU_TASK(message, 100000);
 , token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);

edited Nov 14 '18 at 16:14

answered Nov 13 '18 at 8:14

Maxim T

1047

After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.

[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).

In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.

For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).

For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.

private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
 
 // SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
 CPU_TASK(message, 50);
 // IO BOUND ASYNCH TASK - used as is
 await Task.Delay(50);
 // BUT WRAP the LONG SYNCHRONOUS TASK inside the Task 
 // which is scheduled on the custom thread pool 
 // (to save threadpool threads)
 await Task.Factory.StartNew(() => 
 CPU_TASK(message, 100000);
 , token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);

edited Nov 14 '18 at 16:14

answered Nov 13 '18 at 8:14

Maxim T

1047

edited Nov 14 '18 at 16:14

answered Nov 13 '18 at 8:14

Maxim T

1047

answered Nov 13 '18 at 8:14

Maxim T

1047

answered Nov 13 '18 at 8:14

Maxim T

1047

The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…

– Maxim T
Nov 14 '18 at 12:48

It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.

– Maxim T
Nov 17 '18 at 12:54

I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.

– Maxim T
Nov 17 '18 at 18:15

add a comment |

The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…

– Maxim T
Nov 14 '18 at 12:48

It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.

– Maxim T
Nov 17 '18 at 12:54

I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.

– Maxim T
Nov 17 '18 at 18:15

The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…

– Maxim T
Nov 14 '18 at 12:48

It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.

– Maxim T
Nov 17 '18 at 12:54

I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.

– Maxim T
Nov 17 '18 at 18:15

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt