Saturday, September 17, 2011

An Introduction to JS-Ctypes

Update

If you have time follow the whole story in es-discuss mailing list while if you don't have time here the quick summary:
js-ctypes purpose is different from JS.next typed structs/arrays so it looks like it was my mistake to compare tomatoes and potatoes.
I bet everybody else in this world could have compared these two different beasts due identical name, look, and similar usage.
If ctypes are not used outside JS these are not JIT optimized in any case so now we know why performances are so slow compared with JS code.

On new Struct({literal:pairs}) VS new Struct(literal = pairs) there is still no answer and even if it's obviously possible to avoid an object creation per each created instance, recycling a single object and refreshing its properties same as we could do with properties descriptors and Object.defineProperty I have pointed this out in that way on purpose since I can already see a massive usage of that unoptimized pattern and I would like to know that engines are able to optimize that pattern Just In Time or tracing it.

More questions, "flames", and answers about this topic in the link I have already posted at the very beginning.


A few days ago I had a quick chat with Ben Green about statically defined JavaScript structs.
He reminded me "somebody wrote something about faster JS objects" and I remember I saw it as well but I could not find the bloody source until I crashed again into Brendan Eich blog, more specifically the My TXJS talk post.

JavaScript Binary Data

The slide I am talking about is at page 14:

const // the statically defined and typed structs
Point2D = new StructType({x:uint32, y:uint32}),
Color = new StructType({g:uint8, g:uint8, b:uint8}),
Pixel = new StructType({point: Point2D, color: Color}),
// the static collection
Triangle = new ArrayType(Pixel, 3);

new Triangle([
{point: {x:0, y:0}, color: {r:0, g:0, b:0}},
{point: {x:5, y:5}, color: {r:10, g:10, b:10}},
{point: {x:10, y:0}, color: {r:20, g:20, b:20}}
]);

"Mind Blown!" as first reaction, then I decided to investigate a bit more during the evening in order to bring some better feedback and have a better understanding of this concept ... but how did I do that?

js-ctypes and Mozilla

Even if landed and approved only recently in JS.next, ctypes have been available in Firefox since version 4.
I like the fact Mozilla keeps surprising me as one of the most advanced environment when it comes to JavaScript world but before getting too excited, we'd better keep reading this post.

The (ideal) Purpose

Dave Mandelin in Know Your Engines slides enlightened us describing how things get faster behind the interpreted JavaScript scene. As scripting language developers we would like to do not care at all about details such "do not change variables type" but as I have asked during falsy values conference: "what about objects and their properties?"
JS-Ctypes seem to be the "ideal kick asses performances" trick we all were waiting for: an explicit, yet scriptish, way to describe well known structures in order to make the engine able to optimize and compile these structures runtime and boost up performances.
This concept is not new at all in programming world.

Cython

From Wikipedia:
Cython is a programming language to simplify writing C and C++ extension modules for the CPython Python runtime. Strictly speaking, Cython syntax is a superset of Python syntax additionally supporting:
- Direct calling of C functions, or C++ functions/methods, from Cython code.
- Strong typing of Cython variables, classes, and class attributes as C types.
Cython compiles to C or C++ code rather than Python, and the result is used as a Python Extension Module or as a stand-alone application embedding the CPython runtime
I do believe it comes natural to compare js-ctypes to Cython and I am pretty sure initially this was the exact purpose of the Mozilla extension or, at least, Mozilla folks idea.
Ironically this is the same reason js-ctypes are not available by default in Firefox and others except via extensions.

// if not in an extension, deprecated but
// the only way to bring js-ctypes inline in a web page
netscape.security.PrivilegeManager.enablePrivilege('UniversalXPConnect');

// import ctypes
Components.utils.import("resource://gre/modules/ctypes.jsm");
Bear in mind above code will not work online. In order to test ctypes in Firefox we need to accept privileges risks offline ( file://ctypes.test.html ).
The reason is simple: rather than decouple the power of ctypes from the ability to use compiled libraries or dll, Mozilla put everything into a single module making its usage basically pointless/impossible for Web applications: big mistake!

A Reasonable Shim

It's about 3 years or more I am writing examples and proposals in this blog about "strict typed JavaScript" but this is not the case.
If we want to shim in a good way js-ctypes we should actually forget the type part or performances will be extremely compromised per each bloody created object.
Unit test speaking, once we are sure that Firefox runs all our cases, we'd better trust nothing bad will happen in all shimmed browsers.

try {
netscape.security.PrivilegeManager.enablePrivilege('UniversalXPConnect');
Components.utils.import("resource://gre/modules/ctypes.jsm");
} catch(ctypes) {
// a minimal ctypes shim by WebReflection
this.ctypes = {
ArrayType: function ArrayType(constructor, length) {
var name = (constructor.name || "anonymous") + "Array";
return Function("c", "".concat(
"return function ", name, "(o){",
"var i=(o||[]).length;",
length ? "if(i!=" + length + ")throw 'wrong length';" : "",
"if(!(this instanceof ", name, "))",
"return new ", name, "(o);",
"this.length=i;",
"while(i--)",
"this[i]=new c(o[i]);",
"};"
))(constructor);
},
StructType: function StructType(name, fields) {
for (var key, current, proto = {}, init = [], i = 0; i < fields.length; ++i) {
current = fields[i];
for (key in current) {
if (current.hasOwnProperty(key)) {
init.push("this['" + key + "']=o['" + key + "']");
proto[key] = null;
}
}
}
return Function("p", "".concat(
"function ", name, "(o){",
"if(!(this instanceof ", name, "))",
"return new ", name, "(o);",
init.join(";"),
"}",
name, ".prototype=p;",
"return ", name
))(proto);
}
};
}
To make things even faster, I have adopted an "inline compiled JS" technique so that each defined struct will do most basic tasks per each instance creation.
Following an example about js-ctypes usage:
// as is for native cnstructor, no need to "new"
const Point2D = ctypes.StructType(
"Point2D", // the struct name
[ // the struct description
{x: ctypes.int},
{y: ctypes.int}
]
);

// a struct can be used to define a collection of same type
const Segment2D = ctypes.ArrayType(
Point2D, // the value type
2 // the length
);

// if length is specified, this must match during construction

// if no length is specified any amount of elements can be created
const Line2D = ctypes.ArrayType(Segment2D);

// no need to invoke all constructors
// as long as the Array/Object structure
// matches the defined one
var line = Line2D([
[
{x: 0, y: 0},
{x: 10, y: 10}
], [
{x: 10, y: 10},
{x: 20, y: 20}
], [
{x: 20, y: 20},
{x: 30, y: 30}
]
]);
Even if geometrically speaking above example does not make much sense, being a line by definition represented by infinite number of points, I am pretty sure you got the logic.

Still NOT JS.next

The struct definition is slightly different from the one shown by Brendan Eich but at least the ArrayType signature seems to be similar.
If what Brendan showed is actually true, we will not have a way to define statically typed getters and setters.
Not that a function per each get/set can improve performances, but I consider this a sort of limit over other statically typed programming languages.

10X Slower

Surpriiiiiiiiiiiseeeeeee!!! Even Firefox Nightly performs like a turtle on steroids over statically typed collections and here the test you should save in your desktop and launch via file protocol.
If you see the alert, ctypes have not been loaded ... but if you test in on Firefox via file protocol and you allow the module, you will not see any alert but an actual benchmark of three different types of collections:
  • a generic Array of Objects
  • a typed collection of typed objects
  • an Int32Array implementation over int values with an object creation per each loop iteraction
I don't know what's your score ( and I could not manage to test it via jsperf ) but at least in my MacBookPro numbers are 110ms for ctypes VS 19 or 16 for other two tests.

What The Fuck Is Going On

Pardon my french but I could not describe in a better way my reaction ... however, I have an idea of what's happening there ...

Slow Binding

If ctypes are checking and transforming runtime all values in order to provide nicely written Errors somebody screwed up the speed boost idea here. I would rather prefer to see my browser implode, my system crash, my MacBook explode than thinking every single bloody object creation is actually slower than non statically defined one!
"check all properties, check all types, convert them into C compatible structs, bring them back to JS world per each index access" ... I mean, this cannot be the way to make things faster.
The operation could surely be more expensive in therms of Struct and List definitions but for fuck sake these cannot be trapped behind the scene: these must be instantly available as hidden pre compiled/pre optimized objects and if some assignment goes wrong just exit the whole thing!

Static Is Not For Everybody

Let "week end hobbyists" use JS as they know but give JS the native power of C. Don't try to save poor JS kids/developers here, you either bring this power in or you don't.
Any application that will screw an assignment over a statically typed collection or struct does not deserve a place in the web, as well as any sort of broken C code cannot be compiled or it will kill the execution if something goes wrong runtime.

I am not joking here, think about those developers that actually know what they are doing and forget for once the "too easy to use" concept: we all desire to handle statically typed code via JS and we expect a massive performances boost.

Double Memory Consumption

The typed part of JavaScript seems to ignore a little detail: every object will require both non statically typed structure, {x: Number, y: Number} plus its statically typed equivalent: Point2D.
I am not sure engines can optimize that much here and thinking about mobile platforms I wonder if TC39 team is actually thinking "Desktop only" ... WebCL seems, once again, a much better alternative than ctypes here 'cause if all these operations will mean higher memory footprint and slower interaction we are in a no-go specification that should never land in JS world.
We really can implement by ourself strict type checks so either ctypes bring something powerful and fast or I can see already a lot of effort, implementation speaking, for zero income, real use cases speaking.

const Point2D = ctypes.StructType(
"Point2D", // the struct name
[ // the struct description
{x: ctypes.int},
{y: ctypes.int}
]
);

// how it is now in ES.next too
var p = new Point2D(
{x: 123, y: 123} // why on earth!
);

// how it should be in ES.next
var p = new Point2D(
// no object dependeny/creation
x=123,
y=123
);
Above example is just one out of millions way to better initialize a statically typed structures. Since JS.next will bring new sugar in any case, unless these objects used to initialize a structure will be completely ignored/discarded runtime, creating holes in therms of object reusability, the creation of a complementary object per each static instance is a non-sense.
In few words, no need to overcomplicate engines when these will be already compatible with named defaults function arguments, isn't it?

As Summary

C could land into JavaScript but it must be done properly. A too hybrid solution could bring double problems and all I have tried to do in this post is collaborate with the initiative bringing thoughts and tests.
I hope this part will be specified and implemented properly, removing the "native dll binding" we don't need on te web, neither we do for node.js modules.
Sure it's a nice have, but once we can write proper modules based on statically typed structs and collections, there won't be such big need of pre-compiled C stuff and all cross platform problems at that point will be solved on browser engine level, rather than on JS specific C module side.
Any sort of thoughts and/or clarification will be more than appreciated but right now all I can say is: avoid this extension, don't try to screw with native system libraries, don't use this extension thinking it will bring more efficient, fast, powerful, code into your app.
Thanks for your patience

7 comments:

*/* said...

though statically typed things in Brendans talk and JS-Ctypes look similar they are completely different beasts

main focus of ctypes is calling c functions from external libraries and making js functions callable by external libraries
which means there must be huge overhead compared with the case when all objects are internal to js engine and it can optimize code

ctypes were mainly created to replace binary XPCOM components in firefox extensions, so when you look at them as means of adding type annotations they look pointless

Andrea Giammarchi said...

as I wrote in es ml
"I know current Mozilla implementation is not exactly what will be in JS.next but it was the only way I had to test efficiency of this proposal and performances speaking it looks like an epic fail so far and please feel free to correct me as much as possible, thanks."

Since the JS.next proposal is basically the same I wonder if at least my considerations are valid in therms of double memory usage and slow type check.

Thanks to point out that in any case, appreciated :)

Christopher said...

@Andrea can you please explain why on Earth you would use String.prototype.concat instead of the much-more-performant + operator?

Sam Tobin-Hochstadt said...

As */* points out, this blog post is thoroughly confused. See my response here: https://mail.mozilla.org/pipermail/es-discuss/2011-September/016663.html and Wes Garland's response here: https://mail.mozilla.org/pipermail/es-discuss/2011-September/016664.html

Andrea Giammarchi said...

because concat *is* the fastest concatenation over more than 3 strings and because + operator requires brackets for ternary or conditional logic while concat, in between commas, does not

Andrea Giammarchi said...

Sam I will, you have a look into my reply too please

Christopher said...

@Andrea — now I know. You are Mr. JS Guru.