## Liberation through Computer Literacy

jahboater
Posts: 7700
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Mon Aug 05, 2019 8:30 pm
End result is that the binaries are exactly the same size!
At the end of the day its only 182 bytes and it may take far more than that to affect the binary size.
The "size" command may show the difference. It should be in the "text" part if its immutable like it would be in C.

Anyway, who cares about a few bytes when users here open 582 tabs in their browsers

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

John_Spikowski,
I'm curious how the HASH method relates to the approach being taken?
I'm not sure what you are asking there.

The fundamental problem is how do we know that one word is an anagram of another.

That is to say how do we know that some arrangement of letters is somehow the same as any other rearrangement of the same letters.

How can we do that?

1) Sort all the letters of each word, keeping duplicates, and then check we have the same result.

2) Count the occurrences of each unique letter in each word, and the check we get the same result.

3) Use some maths. Assign a prime number to each letter of the alphabet. Translate each letter of our words to that number and multiply them together giving some huge number. If that huge number is the same then we have an anagram.

See Fundamental Theorem of Arithmetic: "https://en.wikipedia.org/wiki/Fundament ... arithmetic

4) I have no idea. But I guess there is another way.

Personally I'm not sure we should call any of the above a "hash". Generally we try to avoid hash collisions. But in this case we want to deliberately create hashes that are the same for anagrams.

This whole stupid problem turns out to be more interesting than one might think at the outset.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 8905
Joined: Tue Mar 18, 2014 11:47 am

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Mon Aug 05, 2019 8:45 pm
Paeryn,
Wouldn't your while loop in output_anagrams ...
Turns out that "for i in 0..size {..." is not only easier on the eyes but seems to be a few percent faster. I have not measured it rigorously but it's a win anyway.
When I discovered that "for i in 0..size" surprisingly meant "for i in 0, 1, ... , size-1" is when I became disappointed in the syntax of Rust.

When I discovered the resulting executables spent too much time boxing and unboxing complex numbers is when I became disappointed in Kotlin.

When I discovered that my code ran about 50% slower because it was impossible to turn off array bounds checking is when I became disappointed with the Go programming language.

When Love Story peaked at 2 behind Kelly Clarkson’s My Life Would Suck Without You is when I lost interest in Swift.

When I discovered C♯ was the same as D♭ is when I lost interest in the .NET framework.

When the lead developer used a Hamel basis and irrational line numbers to implement n-level meta programming is when I lost interest in FidoBasic.

That leaves the D programming language; however, when I tried to install it, I ended up with Empire Wargame of the Century instead.

I remember when Fortran was popular, not because it was a great language, but because the compilers were everywhere, at least worked and it was good enough. Today I write most of my code in C for similar reasons.
Last edited by ejolson on Wed Aug 07, 2019 1:32 am, edited 3 times in total.

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

ejolson,

1) If you have a gripe about Rust syntax, so what? We all have gripes about all kinds of languages syntaxes.

2) Kotlin requires the Java JVM therefore lack of performance and causing global warming and climate change is not a concern for those selfish guys.

3) Ditto the Go language.

5) C# - See Kotlin but with a differnt VM.

6) D - No idea.

No. What we want is the speed of C and the correctness of ALGOL.

Luckily the Rust guys are working in that.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 8905
Joined: Tue Mar 18, 2014 11:47 am

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Mon Aug 05, 2019 9:49 pm
John_Spikowski,
I'm curious how the HASH method relates to the approach being taken?
The fundamental problem is how do we know that one word is an anagram of another.
A hash table is one way of implementing associative arrays that doesn't on average experience the O(n) slowdown exhibited by linear lists. Since it appears the built-in associative arrays in Script Basic are not based on hash tables, directly using such a data structure should result in significant computational savings unless a digital apocalypse happens first.

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

When it comes to ScriptBasic the problem of detecting anagrams is neither here nor there. Reduce that to zero time and you still have problems:

As far as I can make out from posts here:

1) ScriptBasic has no arrays in the normal sense of indexing some memory some place.

Rather it has linked lists. Requiring a search of the list to find the required array indexed item.

2) ScriptBasic has no associative arrays. In the normal sense of hash tables or binary trees etc.

Rather it has linked lists. Requiring a search of the list to find the required item by key.

As such I'm not surprised it's 100,000 times slower in the anagram challenge.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 8905
Joined: Tue Mar 18, 2014 11:47 am

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Mon Aug 05, 2019 11:20 pm
1) ScriptBasic has no arrays in the normal sense of indexing some memory some place.

Rather it has linked lists. Requiring a search of the list to find the required array indexed item.
Though I haven't looked at the code, the performance of regular arrays in ScriptBasic appears similar to other Basic interpreters.

While it is possible ScriptBasic further supports a kind of sparse array using linked lists where the indexing has holes in it, as no code translated from other programming languages would likely use this feature, the linked lists should not generally impact the speed.

I wonder why Rust is right between Logo and Scratch on the current TIOBE popularity chart. Does that mean it would make a good first programming language?

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

ejolson,

I have no idea about the internals of ScriptBasic other than the clues I get here.

Don't care. It's 10,000 times slower than it need be to get a result. Far to much global warming for my taste.
I wonder why Rust is right between Logo and Scratch on the current TIOBE popularity chart. Does that mean it would make a good first programming language?
Yes.
Memory in C++ is a leaky abstraction .

John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA

### Re: Project Digital Apocalypse Not Now

ScriptBasic arrays allow you to build complex matrix structures of any type with no pactical limitations.

Feature set takes precedence over speed.

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

The "practical" limitation is getting a result before the author of the program dies.

Or the Sun goes super nova, or the heat death of the universe.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 8905
Joined: Tue Mar 18, 2014 11:47 am

### Re: Project Digital Apocalypse Not Now

John_Spikowski wrote:
Tue Aug 06, 2019 1:41 am
Feature set takes precedence over speed.
For those who have just joined or still spectating, the present thread is discussing the Insane British Anagram Challenge described in

https://www.raspberrypi.org/forums/view ... 5#p1504861

As elaborated on in the post

https://www.raspberrypi.org/forums/view ... 0#p1506442

everyone who writes a working program wins. Since the goal of the challenge is to promote computer literacy and thereby avoid the digital apocalypse, speed of execution is secondary.

There are a number of ways to get started with this coding challenge. One way is to examine the programs at the link

http://fractal.math.unr.edu/~ejolson/pi/anagram/

and translate one of them into your language of choice.

Another option is to examine the algorithm described using English in the post

https://www.raspberrypi.org/forums/view ... 5#p1508630

and code that algorithm in your language of choice.

Another way would be to look at the codes and algorithms already presented and write a program that solves the problem in a notably different way.

No matter what approach is taken, any additional entries in the insane bar chart of fame would be greatly appreciated. Languages missing that I would personally like to see include Algol 68 Genie, C#, COBOL, D, Fortran, go, Haskell, Kotlin, Scheme, Smalltalk and Swift. Algorithms missing include red-black trees, 2-3 trees, Fibonacci heaps, random UUIDs and hash tables.

I find it amusing that the insane British dictionary is arguably more useful for promoting computer literacy than English literacy. As widespread computer literacy will be needed to avoid the digital apocalypse, beginners are specifically encouraged to try this challenge. Constructive code review and help will hopefully be provided in all cases.

ejolson
Posts: 8905
Joined: Tue Mar 18, 2014 11:47 am

### Re: Project Digital Apocalypse Not Now

DougieLawson wrote:
Sat Aug 03, 2019 5:22 pm
jahboater wrote:
Sat Aug 03, 2019 5:17 pm
Heater wrote:
Sat Aug 03, 2019 3:50 pm
Edit: When I replace all those short circuit operatirs, "++", "|=", etc, with their long hand equivalents in fftbench it runs at exactly the same speed.
Not surprising I guess.

So which version did you think was the most readable?

I have been looking at C for so many decades that I find the += form quicker to mentally scan. For a = a + n you have to check its actually "a" used in both places, with a += n you can see at a quick glance that its just adding n to a.
Just me.
I prefer i++ to i =+ 1 to i = i + 1. If you want to go to extremes write it out COBOL style

Code: Select all

ADD FOO TO BAR GIVING FOOBAR ROUNDED.
More than one way to express the same thing isn't necessarily bad, especially when the more specialized notations have a conciseness which is clearer.

Do you think the insane British anagram challenge could be done very easily in COBOL?

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

ejolson,
The advantage of using a popular and established programming language is that you need only examine your own code when fixing bugs. With new and emerging technologies the bug could be anywhere in the tool chain...
At risk of sounding like a Rust fanboy I have to say that Rust is no different from C/C++ in this respect. If the segfault I have with that library in Rust were instead the equivalent bug in a library used from C I would be in the same position. I would have the same questions: Is it a bug in my source? Is a bug in the library? Is it a bug in the compiler? Admittedly libs for C/C++ that have been around along time and widely used are less likely to have bugs.
...This is not good in a first programming language designed for beginners.
That statement got me recalling things I perhaps would rather not remember. Time for an anecdote...

The first time anyone ever wanted to pay me to write code in a high level language was about 1981. The company was Northern Telecom, the language was C. I knew nothing of C so they set me up with a computer and a C compiler and I set about learning it.

Very soon I discovered that pretty much any error in my code would cause the program to crash out, with no sensible error message pointing me to my faulty source. Often such a crash would hang the whole machine and require a reboot. Or, worse still, functions that I had tested and run a lot and was confident in, would start to produce wrong results at random when I introduced new code to he program. It was nigh on impossible to find the problem by inspecting my source code.

After a week of this intense frustration and thinking there had to be something wrong with my compiler and/or computer I turned to a colleague and asked something like "Is it normal that in a high level language we can write code that crashes without an error message or brings down the whole machine like that?". Well, you could hear a pin drop. The whole office went quiet. They were all looking at me in amazement, as if I was the biggest idiot they had ever seen. They were old C hands and knew the score. My colleague, bless him, took me aside and, speaking slowly as if to a retard, explained the situation. That was the moment I understood that C was not a high level language. Rather it is a means of writing assembler without the architecture specific instructions. After that all went well, I had written lots of assembler before that so I could deal with it.

Was I an idiot for asking that question? Perhaps but... my experience of high level languages in tech school and uni. before that were in BASIC and Algol. In both cases the compiler/interpreter would never just bomb out. There were no undefined behaviors. If you did something the compiler did not like it would stop and tell you where your mistake was, either at compile time or run time.

With those memories in mind I conclude that C/C++ are most certainly not good as a first programming language for beginners to programming.

BASIC and Algol were. They stopped at any undefined behavior and issued helpful error messages rather than crash or give randomly wrong results.

Rust, with it's absence of undefined behaviors and excellent error messages is far more beginner friendly, in the mold of Algol.
Memory in C++ is a leaky abstraction .

John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA

### Re: Project Digital Apocalypse Not Now

I hope to have the HASH extension module issue resolved soon. Replacing associative arrays with the HASH extension module shows a significant improvement in execution speed. Hope to repost something soon.

gkreidl
Posts: 6345
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Tue Aug 06, 2019 10:34 am
That was the moment I understood that C was not a high level language. Rather it is a means of writing assembler without the architecture specific instructions.
+1
Minimal Kiosk Browser (kweb)
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

rpdom
Posts: 19504
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

### Re: Project Digital Apocalypse Not Now

ejolson wrote:
Tue Aug 06, 2019 8:48 am
Do you think the insane British anagram challenge could be done very easily in COBOL?
Not easily, but it could be done. I did something like it about 30 years ago in COBOL to enable fuzzy matches on a database.

jahboater
Posts: 7700
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Tue Aug 06, 2019 10:34 am
That was the moment I understood that C was not a high level language. Rather it is a means of writing assembler without the architecture specific instructions. After that all went well, I had written lots of assembler before that so I could deal with it.
Are you saying that the definition of a high level language is one that checks everything for beginners?

Take a common pointer de-reference as an example.
It has to be checked to make sure its not NULL and indeed pointing to memory addressable by the process.
If it is a write, then the destination page must be checked to see that its writeable, and finally the alignment must be checked (if needed by the hardware). All that must be done to be sure the de-reference will not fail.

No HLL with any pretensions of high performance is going to do all that.

OK, some gain may be had by restricting a pointer's domain allowing the compiler to do some of the checks. But you then end up having to rely on languages like C to do the real work.

jalih
Posts: 194
Joined: Mon Apr 15, 2019 3:54 pm

### Re: Project Digital Apocalypse Not Now

rpdom wrote:
Tue Aug 06, 2019 10:58 am
ejolson wrote:
Tue Aug 06, 2019 8:48 am
Do you think the insane British anagram challenge could be done very easily in COBOL?
Not easily, but it could be done.
Simply use keyed data set instead of dictionary?

John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA

### Re: Project Digital Apocalypse Not Now

For grins, I"m going to give the SQL approach a try.

ejolson
Posts: 8905
Joined: Tue Mar 18, 2014 11:47 am

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Tue Aug 06, 2019 10:34 am
With those memories in mind I conclude that C/C++ are most certainly not good as a first programming language for beginners to programming.
Crashing the entire operating system with a user-level program written in C did not happen with the original Kernighan and Ritchie C compiler on the PDP-11 nor does it happen now with the GNU C compiler on the Raspberry Pi.

When I was growing up my parents had the idea--probably not original--that training wheels on a bicycle only made it more dangerous. Learning how to balance the bike properly in the beginning was preferable. To do that some instruction was needed. I have fond memories of my father running after the bike to catch it before I fell. Moreover, it did not take long for me to ride well.

Some people have suggested it is educationally important for a beginning programmer to learn assembly language. The resulting knowledge is supposed to provide the intuition about how a computer works necessary for understanding, reasoning about and effectively using a high-level language. From this point of view C programming, though not assembler, may be close enough to teach a beginner about memory addresses, addressing, indirection and the types of arithmetic and conditional operations physically present in the hardware of a typical processor.

Could it be that putting training wheels on the beginner's first programming language only makes learning computer science slower and more difficult in the end?

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

ejolson,

Kernighan and Ritchie were lucky. They did not have to learn C on MS-DOS

Rogue pointers and out of bounds array access can certainly bring down embedded systems and such that have no MMU.

I'm all for not putting training wheels on kids bikes. Worked for me. However there is overwhelming and ever mounting evidence to suggest that mature, experienced, skilled, C/C++ programmers cannot produce safe code. As such they need all the "training wheels" they can get.

Now, when a kid falls off his bike he might scrape is knee. Not much of a disaster and localized to that one kid. When programmers ship buggy operating systems, web servers and all kind of other software it can result in major malfunctions and security issues for the users of that code. As such the consumers of code should insist that the developers use "training wheels" all the time.

I'm all for people being exposed to assembler as early on as possible. Meanwhile the high level language they use should completely specified and as bullet proof as possible.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 8905
Joined: Tue Mar 18, 2014 11:47 am

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Tue Aug 06, 2019 3:29 pm
I'm all for not putting training wheels on kids bikes. Worked for me. However there is overwhelming and ever mounting evidence to suggest that mature, experienced, skilled, C/C++ programmers cannot produce safe code. As such they need all the "training wheels" they can get.
Rather than training wheels, I might be better off with an adult tricycle.

I agree that a low-level assembly-like language plays an important role in learning computer science. It would appear from your anecdote that learning assembly language as a beginner was extremely useful. Imagine how much harder it would have been for a person who knew only Basic to write that C code.

The fact that a first programming language often ends up being a person's only programming language confounds the distinction between languages designed for teaching, languages designed for hobbyists and languages designed for experienced software engineers. Further confusion may arise from the profusion of domain-specific languages that make one task easy while making others difficult.

Even for experienced software engineers, not all programming tasks can be accomplished without directly referring to computer hardware features such as memory addresses, management units, input-output ports, registers, atomic operations, cache flush, interrupt controllers and so forth. What happens when programmers accustomed to training wheels have to deal with unsafe constructs is evident with the Rust spmc single-producer multiple-consumer software library.

While programming in C may be more prone to error than Python, it is likely programmers writing device drivers for the Linux kernel would make even more errors if all their other work involved the use of training wheels. As a result, increasing the difficulty of the computer-science curriculum by employing C as the first language may actually help prevent the digital apocalypse and its aftermath.
Last edited by ejolson on Tue Aug 06, 2019 8:56 pm, edited 3 times in total.

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

Certainly there are times when one needs to take the guards off the wood chipper or bypass that annoying overload trip.

Probably better not to make a habit of it, beginner or otherwise.
Memory in C++ is a leaky abstraction .

Heater
Posts: 18860
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Project Digital Apocalypse Not Now

jahboater,
Are you saying that the definition of a high level language is one that checks everything for beginners?
Not really, I was perhaps over emphasizing my point a bit there. I don't want get into a debate about the meaning of "high level language", there is of course a graduation in height.
Take a common pointer de-reference as an example.
It has to be checked to make sure ...
...
No HLL with any pretensions of high performance is going to do all that.
No necessarily.

The Rust guys claim is that there is no run time overhead caused by their pointer checking. There is no pointer value checking going on at run time. It's all done at compile time.

How is that possible?

Well, if your language does not allow you to read via an uninitialized pointer, if it does not allow the creation of null pointers, if it does not allow you to point at the wrong thing, if it does not allow you to share pointers for write operations, then there is no need for pointer checking at run time. You cannot create a rogue pointer in the first place.
OK, some gain may be had by restricting a pointer's domain allowing the compiler to do some of the checks. But you then end up having to rely on languages like C to do the real work.
The claim is not just some gain but total avoidance of rogue pointers at compile time.

There is no run time system required by Rust. Like C, it can produce libraries with no standard library and no run time required. No dependence on C at all.

I have not verified these claims myself yet. Time to break out objdump. But we have seen that Rust matches C/C++ on performance and produces similar sized binaries. I'm almost convinced already. Convinced enough that I think it's worth pursuing further.
Memory in C++ is a leaky abstraction .

jahboater
Posts: 7700
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

### Re: Project Digital Apocalypse Not Now

Heater wrote:
Tue Aug 06, 2019 5:58 pm
Well, if your language does not allow you to read via an uninitialized pointer, if it does not allow the creation of null pointers, if it does not allow you to point at the wrong thing, if it does not allow you to share pointers for write operations, then there is no need for pointer checking at run time. You cannot create a rogue pointer in the first place.
Yes, I can see that.
But what about a function in a separate TU, a separately compiled and linked module, maybe a public library function.

The function has been been presented with a "char *" as an input parameter. It must do all those checks by hand to ensure a crash free experience? Or perhaps the function body is declared "unsafe". Or perhaps rustc/cargo or whatever it is, expects to manage the entire project including all the libraries. Uuugh.

Or take the simple library function strcpy(), does it also get passed the lengths of both argument strings? Otherwise how can it do the bounds check? Perhaps it avoids that sort of thing, by including small string handling stuff within the language itself.

I'd be interested in what objdump comes up with.