Project Description
a high performance threadpool good for tiny / small / medium processor work, which is the base of event_comb procedural programming framework.

this is a high performance threadpool implementation based on AutoResetEvent and thread-safe / lock-free queue <currently there are two solutions based on a locked thread-safe queue and a lock-free queue, with some minor performance difference>, while you can also provide your own thread-safe queue implementation.
this is not a 'better' threadpool than .net internal threadpool, while the .net internal threadpool needs to handle also sync io operation, and properly sleep in the logic. instead, this is a threadpool for mainly tiny / small / medium processor work, no sync io operations, no sleep, not good for heavy lock operations, a typical usage is to handle event_comb for a better performance. will follow a banchmark for details.
the design of the threadpool is to push all the work as a lambda expression in the thread-safe queue, there are several reader / worker threads <the default setting is same as the processor count> to read the work item from the queue and handle the job. if there is no more job, the thread will be in idle status and wait for the AutoResetEvent forever. when pushing a job into the queue, if the count of idle threads is more than 0, the push operation will also set the AutoResetEvent. so it has a very good performance for small and large number processor work items. and for huge and small number work items, several processors may be idle.

for the banchmark, i have write a small piece of code called fake_processor_work, which is just spinwait for a specific time. the unit of the time is tick <1/10000 milliseconds>, so the information below is based on the length of the fake_processor_work, say, threadpool_perf_10 means a large number of workitems, each one will spinwait for 10 ticks. for the detail of the test case, please refer to the source code.

on two cores machine, <thinkpad x61t with L7500 and 8G memory, windows 8, the memory is not so relevant, but just as a reference>, the managed_threadpool is .net internal threadpool, the slimqless2_threadpool is the implementation with slimqless2 as lockfree queue, the qless_threadpool is the implementation with qless as thread-safe queue.
threadpool_perf_10
osi.root.threadpool.managed_threadpool milliseconds 72997
osi.root.threadpool.slimqless2_threadpool milliseconds 29568
osi.root.threadpool.qless_threadpool milliseconds 27493
threadpool_perf_100
osi.root.threadpool.managed_threadpool milliseconds 33476
osi.root.threadpool.slimqless2_threadpool milliseconds 17457
osi.root.threadpool.qless_threadpool milliseconds 16969
threadpool_perf_1000
osi.root.threadpool.managed_threadpool milliseconds 39253
osi.root.threadpool.slimqless2_threadpool milliseconds 32081
osi.root.threadpool.qless_threadpool milliseconds 31227
threadpool_perf_10000
osi.root.threadpool.managed_threadpool milliseconds 54703
osi.root.threadpool.slimqless2_threadpool milliseconds 53616
osi.root.threadpool.qless_threadpool milliseconds 54012
threadpool_perf_100000
osi.root.threadpool.managed_threadpool milliseconds 95503
osi.root.threadpool.slimqless2_threadpool milliseconds 101752
osi.root.threadpool.qless_threadpool milliseconds 101464
threadpool_perf_1000000
osi.root.threadpool.managed_threadpool milliseconds 127507
osi.root.threadpool.slimqless2_threadpool milliseconds 197726
osi.root.threadpool.qless_threadpool milliseconds 199352

on surface rt, with 4 cores + 2G memory and .net 4.0 instead of .net 3.5
threadpool_perf_10
osi.root.threadpool.managed_threadpool milliseconds 65077
osi.root.threadpool.slimqless2_threadpool milliseconds 43086
osi.root.threadpool.qless_threadpool milliseconds 50350
threadpool_perf_100
osi.root.threadpool.managed_threadpool milliseconds 27044
osi.root.threadpool.slimqless2_threadpool milliseconds 20489
osi.root.threadpool.qless_threadpool milliseconds 24799
threadpool_perf_1000
osi.root.threadpool.managed_threadpool milliseconds 38332
osi.root.threadpool.slimqless2_threadpool milliseconds 22694
osi.root.threadpool.qless_threadpool milliseconds 25363
threadpool_perf_10000
osi.root.threadpool.managed_threadpool milliseconds 38758
osi.root.threadpool.slimqless2_threadpool milliseconds 30968
osi.root.threadpool.qless_threadpool milliseconds 31760
threadpool_perf_100000
osi.root.threadpool.managed_threadpool milliseconds 57559
osi.root.threadpool.slimqless2_threadpool milliseconds 52614
osi.root.threadpool.qless_threadpool milliseconds 52834
threadpool_perf_1000000
osi.root.threadpool.managed_threadpool milliseconds 59335
osi.root.threadpool.slimqless2_threadpool milliseconds 100849
osi.root.threadpool.qless_threadpool milliseconds 100538

on windows azure, an 8 cores virtual machine with 14G memory, and windows 2008 R2
threadpool_perf_10
osi.root.threadpool.managed_threadpool milliseconds 30969
osi.root.threadpool.slimqless2_threadpool milliseconds 9562
osi.root.threadpool.qless_threadpool milliseconds 16078
threadpool_perf_100
osi.root.threadpool.managed_threadpool milliseconds 19562
osi.root.threadpool.slimqless2_threadpool milliseconds 5797
osi.root.threadpool.qless_threadpool milliseconds 6922
threadpool_perf_1000
osi.root.threadpool.managed_threadpool milliseconds 17484
osi.root.threadpool.slimqless2_threadpool milliseconds 10063
osi.root.threadpool.qless_threadpool milliseconds 10297
threadpool_perf_10000
osi.root.threadpool.managed_threadpool milliseconds 19563
osi.root.threadpool.slimqless2_threadpool milliseconds 15422
osi.root.threadpool.qless_threadpool milliseconds 15718
threadpool_perf_100000
osi.root.threadpool.managed_threadpool milliseconds 19125
osi.root.threadpool.slimqless2_threadpool milliseconds 26875
osi.root.threadpool.qless_threadpool milliseconds 27031
threadpool_perf_1000000
osi.root.threadpool.managed_threadpool milliseconds 26375
osi.root.threadpool.slimqless2_threadpool milliseconds 50531
osi.root.threadpool.qless_threadpool milliseconds 50031

so briefly for tiny <10 ticks> / small <100-1000 ticks> / medium <10000 ticks = 1ms> work items, the slimqless2_threadpool has a much better performance than internal managed threadpool


this threadpool implementation is pretty coupled with other components in the osi project, geminibranch.codeplex.com, so i will leave a non-compilable version in the source code, and a beta release in the download, i will also update the code / binary here. but for the latest code and binary, please go to the geminibranch.
to use the threadpool, you need to refer osi.root.threadpool.dll into your project. call register_slimqless2_threadpool to register the threadpool, and get a valid threadpool by calling resolve_ithreadpool, it will return an ithreadpool interface. do not forget to run osi.root.utt.exe threadpool_test to make sure you have got a good build.
the default build is against .net 3.5, but i have also added a build-with-4.0.cmd into geminibranch/osi, and finished all the test pass on surface rt <surely you need a jail broken version>, except for several performance cases. so if you need some specific .net version, enlist the code and build from scratch is a good way. after the build, run osi.root.utt.exe * from osi/utt/bin/release | debug to verify basic functionalities.

enjoy, my pleasure if any can help you.

/*******************************
non-commercial use only
*******************************/

Last edited Jun 10, 2013 at 2:11 PM by Hzj_jie, version 5