HPCguy wrote: ↑
Wed Jul 06, 2022 1:23 am
I feel I need to speak out here, rather than mislead a generation of young minds concerning HPC. Task models like HPX are no panacea. Often they are demonstrated on toy problems with specific core kernels, but are rarely used in production codes with many constraints. If you can name a counterexample in a code containing more than 250000 lines with a lifetime of more than ten years, please point it out. The lack of task models in large long lived multiphysics codes is not for lack of trying, but because theory applied to small or focused problems is one thing, but maintaining this model in large production codes often exposes their inefficiencies and drawbacks. In my opinion, the only task based programming environment that has teeth and potential for HPC multiphysics is the Loci model from Mississippi State University. All the others that get a lot of PR are not widely used in spite of the massive hype, and have severe drawbacks. As for my credentials, the last time I was asked to review task based academic papers, I declined saying I wasn't qualified, and was surprised to get pushback from the source, saying they knew full well I was an expert in precisely the subject matter, and would I please review the paper rather than shirk my responsibilities to the academic community.
That said, for *many* non-production scale HPC and other applications, event based threads, aka tasks, are wonderful, and are in fact often the best way to go.
I have over 30 years of multiphysics HPC codes based on a task model. I started and directed the writing of two codes (one for structures and one for fluids) that are still in use and have hundreds of thousands of lines of code. Others have taken over, but the codes still live and still run on some of the largest HPC systems in the world.
You will find public versions of one of those codes at https://bitbucket.org/frg/aero-s/downloads/
I started this code over 30 years ago. Many students have contributed and it is not the cleanest code out there. But it still lives and satisfies your requirement for showing a code that does so. I'll also mention that i am a co-winner of a Gordon Bell prize with a derivative of this code.
As for HPX, they run it on extremely large astrophysics problem.
And for my more recent work, I have written for a client what my client has measured as the fastest parallel multi-frontal solver on the market (and other sparse solvers). I could not have written it if it weren't for tasks. parallelfor loops are clunky for such a program.
If anybody thinks a parallel multi-frontal solver is a toy problem, I challenge them to write one with parallel for loops.
One of my works after that was to write a different type of sparse solver (so-called Left Looking) using C++20 coroutine based tasks. I have been amazed how quickly I reached similar milestones from the multi-frontal solver much faster, because the language supported coroutine approach helps thinking more naturally of asynchronous tasks and write the operations much more compactly. That makes writing, but even-more importantly, re-reading and reviewing code much easier.
One thing that has bothered me is that several people have pontificated here about coroutines without having what seems like the basic knowledge of any details about the C++20 coroutines. It seems most statements are based on the library based coroutines such as boost which are so far from the C++20 mechanism that you cannot draw any conclusion from one and extended it to the other.
In fact, since HPCguy mentioned protothread, I'll mention that I took a look at protothread and how it is implemented. A look at C++20 coroutines will reveal that they are actually a very similar approach. protothread might have been called protocoroutines. But the C++20 coroutines are language supported, support local variables (protothread cannot save local variables with its design) and offer many more tools. For example, going to the point of @dthacher about memory allocation, you are given the tools to avoid dynamic memory allocation as well.
I have said they are complex to understand because of the way they were designed to be an extremely flexible building tool for library writers. I am writing a library so others can use them on the pico. If you don't want to use it, that's not my business, but I don't see a reason to attack my attempts as futile.
This will be my last post in this thread and I think it should be closed.