Because my blogspot blog has been continuously blocked in the country
I live in since May 2009, I've been posting entries here...
19 March 2010Strach parserI've posted a wiki showing some Strach combinator parsers (written in Groovy++) working with dynamic Groovy's ASTBuilder.
The operators are only tentative. Here's hoping the Codehaus Groovy developers will allow some more intuitive operators similar to those in regex, PEG, or Scala.
8 March 2010Groovy++About half a year ago, I began to drift away from the Groovy 1.x Language, lured by the benefits of Scala. I even seriously considered moving my Groovy Language brand to a Scala-based engine, creating the Groovy syntax over that. But lately, I've been exploring the benefits of Alex Tkachman's Groovy++, with its @Typed and @Trait keywords, and even static type inference. It's brought my heart back to the true Groovy Language, with renewed vigor to fulfill the Groovy Language vision. Scala, like Haskell and Scheme and many other languages, hold wonders, but thanks to Groovy++, I've left my wayward path to return to the true Groovy Language, in awe of what other unseen miracles those AST transformations could bring!
I've turned the Groovy site at Codeplex into the official Groovy Language dashboard. I hope to expand it and keep it up to date to acknowledge everyone who's made a significant contribution to developing the Groovy Language programming ladder.
3 March 2010Groovy CeasefireI have a dilemma when creating a custom lexer/parser over Groovy's AST. The Groovy Language builders let me build the AST using the very readable builder syntax, while the Scala Language combinator parsers let me define my custom syntax using easy-to-read infix operators. The coming Scala 2.8 upgrades these to the time-efficient packrat parsers, also allowing left-recursive definitions enabling declarative definition of expression paths.
But my dilemma is I can't use both styles of syntactic shortcut in one program. What I want to code is something like this mixed Groovy/Scala style pseudocode:
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.input.CharSequenceReader
import scala.util.matching.Regex
import org.objectweb.asm.Opcodes._
import org.codehaus.groovy.ast._
import org.codehaus.groovy.ast.stmt._
import org.codehaus.groovy.ast.expr._
import org.codehaus.groovy.ast.builder.AstBuilder
import java.security._
import groovy.lang.GroovyClassLoader
import org.codehaus.groovy.control._
object ScalaCallingNodes extends Application with RegexParsers{
def whitespaceLexer = (" " | "\r\n" | "\r" | "\n")*
def nameParser = whitespaceLexer ~> "[A-Za-z_][A-Za-z_0-9]*".r
def nameParser(name:String) = whitespaceLexer ~> name
def stringParser(open:Regex, close:Regex) =
whitespaceLexer ~> open ~> ("[^" + close + "]*").r <~ close
def symbolParser(symbol:String) = whitespaceLexer ~> symbol
def printlnParser= nameParser("println") ~ stringParser("'"r, "'"r) ^^ {case name ~ paramText=>
new AstBuilder().buildFromSpec{
expression{
methodCall{
variable "this"
constant name
argumentList {
constant paramText
}
}
}
}
}
def blockParser(open:String, close:String) =
symbolParser(open) ~> printlnParser <~ symbolParser(close) ^^ {case println=>
var block= new BlockStatement([], new VariableScope())
block.addStatement(println)
block
}
def paramParser= symbolParser("(") ~> nameParser("String") ~> symbolParser("[") ~>
symbolParser("]") ~> nameParser <~ symbolParser(")")
def mainMethodParser= nameParser("public") ~> nameParser("static") ~> nameParser("void") ~>
nameParser("main") ~> paramParser ~ blockParser("{", "}") ^^ {case paramName ~ block=>
var method= new AstBuilder().buildFromSpec{
method('main', ACC_PUBLIC | ACC_STATIC, Void.TYPE) {
parameters{
parameter paramName: String[].class
}
exceptions{}
}
method.block= block
method
}
def classParser= (nameParser("public") ~> nameParser("class") ~> nameParser <~ symbolParser("{")) ~
mainMethodParser <~ symbolParser("}") ^^ {case name~method=>
var classes= new AstBuilder().buildFromSpec{
classNode name, ACC_PUBLIC, {
classNode Object
interfaces{
classNode groovy.lang.GroovyObject
}
mixins{}
genericsTypes{}
}
}
classes[0].addMethod(method)
classes[0]
}
var classData= """
|public class MyClass{
| public static void main(String[] args){
| println 'Hello, world!'
| }
|}""".stripMargin
var cn= classParser(new CharSequenceReader(classData)).get
var cu= new CompilationUnit(CompilerConfiguration.DEFAULT, null, new GroovyClassLoader())
cu.addClassNode(cn)
cu.compile()
}If I code this in Scala, I must build the AST using normal method calls, and build the arrays and Groovy lists in longhand. If I code this in Groovy, I must use the quoted prefix-style to call the Scala-based parsers, and I don't think I can even code the Scala Functions in Groovy. To handle the complexity of parsing experimental syntax to the Groovy AST, I need to use both Scala-style infix method calls and Groovy-style builder syntax so as to bring enough syntactic tersity to make such large-scale experimentation possible. But they're not both available in either language.
So what's the solution? After several years of experimentation, I believe programming languages should be statically typed, with inference, but allow dynamic typing when required. Running Groovy-style builders requires dynamic typing, while combinator parsers should be statically typed for efficiency. Perhaps one day, Scala will bring optional open classes, enabling builders, but that's probably years away, if at all. Perhaps someone could strip the dynamic-typing engine out of Groovy, and retrofit it into Scala, but the Scala developers would need to tweak the Scala syntax to allow it: this doesn't sound likely.
One the Groovy side, the best option is for Groovy to have a static mode, with mixins (traits), pattern matching, and infix symbol definitions, enabling Scala-style combinator parsers to be defined. Failing that, infix calls of symbol-named methods would allow me to call Scala's parsers tersely. Groovy should enable both infix calls of Scala's symbol-named methods and definitions of Scala Functions soon, so developers can experiment with how well we can combine combinator parsers with builders.
But even better is if Groovy gets a statically-typed mode. It just so happens Alex Tkachman is building Groovy++, a statically-typed JVM-based language with Groovy's syntax. He once wrote he thought a static version of Groovy should have Scala-style traits. Groovy 1.x project leader Guillaume Laforge wants pattern matching in dynamic Groovy, and if it comes, Groovy++ would follow suit to maintain syntactic compatability. Groovy would then only need to enable symbol-named methods and infix calls of them, so someone could clone the Scala packrat combinator parsers in Groovy++.
The symbols the Scala developers have chosen for their combinator parsers have the potential to become standardized. Many of them follow the (defacto-)standardized regex symbology, and regexes can easily be embedded between the parser symbols: the regexes and parsers working together would enable a terse context-sensitive parsing language which could become a standard. The Groovy Language should support this effort by using the same symbols for its combinator parsers.
When I'm able to tersely packrat-parse code and build a Groovy AST using builders, I can then easily experiment with designing my own custom language syntax for Groovy. A few years ago, I wrote some online notes for Java newbies learning Groovy, intending to discover the most commonly used JDK methods, so I would know which ones to create syntactic shortcuts for. And of course, I want to use CJK tokens in the custom syntax, for even more tersity.
Call for a ceasefire
But there's one major problem: the Groovy 1.x and Groovy++ project managers are at odds with each other, jockeying to protect and promote their positions in the Groovy ecosystem heirarchy! This behavior isn't good for the Groovy Language. The dynamic Groovy 1.x and the static Groovy++ need to merge into one distro, and both project managers need to give a little so the Groovy Language can move forward. I feel a little responsible in calling for this to happen because of my unique association with the Groovy Language brand. When I first came across Groovy 1.x, I assumed the role of underwriter. But lately, it's finally dawned on me that not only am I the underwriter for Groovy, but I actually own the Groovy Language brand. On that infamous day in Dec 2005 when James Strachan left the developer team, the project management baton passed to Guillaume Laforge, but the brand ownership passed to me!
So I'm calling on the Codehaus despots and Alex Tkachman to think of the future of the Groovy Language in their negotiations. It's future is more important than any petty quarrels overs financial placing in the ecosystem. I'm the Groovy Language brand-owner, and Groovy's my middle name, so I'm more interested in advancing the potential of Groovy than milking the largest possible share of the pie, and I'm calling for all parties to think of the future of the groovy Groovy Language, both static and dynamic, so developers everywhere will marvel at its power.
After Groovy++ and Groovy 1.x are shipping in one distro, Groovy 1.x can then be written in Groovy++. Groovy will soon after bring mixins and pattern matching, someone will then clone Scala's packrat parsers, Groovy will enable infix method calls, then I and others can experiment with terse parsing and AST building, creating a terser programming syntax to the Groovy AST, using all Unicode characters! Groovy's future in on the line: can you see the vision?
2 February 2010Groovy being GPL'd?In an interview with Andres Almiray, Groovy 1.x developer Alex Tkachman explains his latest creation, Groovy++, a static add-on to the dynamic Groovy Language 1.x. Simply by adding the @Typed annotation, the annotated code will compile statically. I've been wanting this addition to the Groovy Language for a long time, and in my 9 July 2009 blog entry, even talked of moving Groovy's primary implementation from the Groovy 1.7 AST to the Scala 2.8 parse tree. Alex called this crazy talk from a crazy person, but then he saw the light and began creating the statically-compiled Groovy++.
Alex writes in the interview:
There are two issues here, which prevent us from open-sourcing the compiler immediately. First of all, it uses several pieces of technology, which our company uses and plans to use in our commercial products. It was not critical when project started as experiment but now we need to extract these parts and replace/rewrite with proper open-source alternatives. The second problem is interesting by itself. We are talking with several well-known vendors about their involvement with the project. There is no much sense in finalizing exact OSS license before these discussions are not completed and we are sure that all interests are well covered. Something interesting is coming and I wish I could tell you more right now.
What are these several pieces of non-open-sourcable technology? And how did Alex code it all up so quickly? Did he use Sun's OpenJDK as the base for Groovy++ ? Is Groovy++ simply a more deeply embedded reincarnation of his joint Groovy/Java compiler? Is he threatening to use this irresistable update to Groovy 1.x to fork and take control of the primary implementation of the Groovy Language, even threatening to GPL it? Are these discussions with EMC/VMware/SpringSource and Oracle/Sun???
Groovy 2.0 update
As for my own current experimentation, I'm attempting to build a reasonably fast lexer/parser using Scala 2.8's packrat combinator parsers and the Groovy 1.7 ASTBuilder. The different code elements (e.g. statements, expressions) must be pluggable into the lexer/parser, without nls! everywhere to cater for the implied semi-colons. The format of different lexical elements (e.g. strings, GStrings, even dates) must be definable by annotations in the syntax. I'm hoping the left-recursion allowed by the new Scala 2.8 packrat parsing will even let us plug in path elements in path expressions.
Of course, Scala's parsers, with their non-alphanumeric names, don't look very elegant when called from Groovy code, but I can't use Groovy's elegant ASTBuilder calls from within Scala code. A Catch-22 ! Perhaps one day Groovy will allow Scala's non-alphanumerically-named methods to be called elegantly from within Groovy, so they look like operator calls? Or perhaps one day Scala will allow dynamic variables, like C# 4.0, so we can create builders within Scala, making writing to its AST as elegant as in Groovy and Ruby. Building HTML with builders is more elegant than Scala's embedded HTML text.
My progress in building a new lexer/parser for Groovy 1.x isn't as spectacular as Alex's progress with Groovy++ (assuming he didn't use OpenJDK), but I'm getting there. The Groovy Language must be fully configurable so developers can use any natural language they want in the syntax.
23 January 2010Groovy DilemmaIn chapter 7 of Steven Pinker's 1994 book The Language Instinct, he gives an example of a perfect right-branching sentence:
Remarkable is the rapidity of the motion of the wing of the hummingbird.This is parsed in the human brain as shown by the parentheses:
(Remarkable (is (the (rapidity (of (the (motion (of (the (wing (of (the (hummingbird))))))))))))).remarkable is the subject, the remainder is the predicate. is is the main verb, the remainder is its object (here, called the complement). the is the article, the remainder is its referent. rapidity is a phrasal head, the remainder is a prepositional phrase as tail. of is a preposition, the remainder is its tail in the phrase. And so on. Pinker gives another example easy for the brain to parse, one that includes relative and subordinate clauses:
(He gave (the candy (to the girl (that (he met (in New York) while (visiting his parents (for ten days (around Christmas and New Year's)))))))).He rearranges it so its far harder for our minds to parse:
(He gave (the girl (that (he met (in New York) while (visiting his parents (for ten days (around Christmas and New Year's)))))) the candy).The direct object the candy after the many closing parentheses forces our short-term memories to keep track of dangling phrases that need particular words to complete them. It seems our brains, unlike computers, can only remember a few dangled branches when parsing sentences.
Perhaps that's why the Lisp code that's easiest for humans to read ends with many closing parens, such as this tail-recursive sample from chapter 2 of Paul Graham's On Lisp:
(defun our-length (lst)
(if (null lst)
0
(1+ (our-length (cdr lst)))))Left-branching sentences are also easy for humans to parse. Pinker gives another example with two arrangements, one harder for humans to parse:
((The rapidity (of the motion (of the wing (of the hummingbird)))) is remarkable).and the other, a perfect left-branching sentence, easy:
(((((The hummingbird)'s wing)'s motion)'s rapidity) is remarkable).English has just a few left-branching structures, but some languages, such as Japanese, are primarily based on them.
One of the universals in Universal Grammar theory, which both Pinker and Noam Chomsky support, is that if a language has verbs before objects, as English does, then it uses prepositions, while if a language has objects before verbs, as Japanese does, it uses postpositions. Pinker mentions a possible reason this universal holds is so the language can enforce a consistent branching decision, either left-branching or right-branching, so our brains can parse it easily.
Some grammatical English sentences are impossible for our brains to parse simply because there's too many dangling branches. The first of these examples parses in our brains OK, but the other two simply don't parse:
(The rapidity (that the motion has) is remarkable).
(The rapidity (that the motion (that the wing has) has) is remarkable).
(The rapidity (that the motion (that the wing (that the hummingbird has) has) has) is remarkable).They do parse in computer languages, though. When I discovered closures in Groovy, I started using this type of unreadable embedding, but I now realize I should be making my code either left-branching or right-branching to make it more readable.
16 January 2010Programming Language StructureProgramming languages have their origin in natural language, so to understand the structure of computer languages, we need to understand natural ones. According to Systemic Functional Grammar (SFG) theory, to understand the structure of language, we need to consider its use: language is as it is because of the functions it's required to serve. Much analysis of the English language has been performed using these principles, but I haven't found much on programming languages.
Functional grammar of natural languages
According M.A.K. Halliday's SFG, the vast numbers of options for meaning potential embodied in language combine into three relatively independent components, and each of these components correspond to a certain basic function of language. Within each component, the networks of options are closely interconnected, while between components, the connections are few. He identifies the "representational" and "interactional" functions of language, and a third, the "textual" function, which is instrumental to the other two, linking with them, with itself, and with features of the situation in which it's used.
To understand these three components in natural languages, we need to understand the stages of encoding. Two principle encodings occur when speech is produced: the first converts semantic concepts into a lexical-syntactic encoding; the second converts this into spoken sounds. A secondary encoding converts some semantics directly into the vocal system, being overlaid onto the output of the lexical-syntactic encoding. Programming languages have the same three-level encoding: at the top is the semantic, in the middle is the language syntax, and at the bottom are the lexical tokens.
The representational function of language involves encoding our experience of the outside world, and of our own consciousness. It's often encoded in as neutral a way as possible for example's sake: "The Groovy Language was first officially announced by James Strachan on Friday 29 August 2003, causing some to rejoice and others to tremble."
We can analyze this as two related processes. The first has actor "James Strachan", process "to officially announce", goal "the Groovy Language", instance circumstance "first", and temporal circumstance "Friday 29 August 2008"; the second process is related as an effect in a cause-and-effect relationship, being two further equally conjoined processes: one with process "to rejoice" and actor "some"; the other with process "to tremble" and actor "others".
The interactional function of language involves injecting the language participants into the encoding. A contrived example showing many types of injects: "The Groovy Language was first announced by, of all people, creator James Strachan, sometime in August 2003. Was it on Friday 29th? Could you tell me if it was? Must have been. That august August day made some happy chappies like me rejoice, didn't it?, yeehaaaah, and probably some other unfortunates to tuh-rem-ble, ha-haaah!"
We see an informal tone, implying the relationship between speaker and listener. There's glosses added, i.e. "of all people", "august", "happy chappies like me", "unfortunates", semantic words added, i.e. "creator", semantic words removed, i.e. "officially", sounds inserted, i.e. "yeehaaaah", "ha-haaah", prepended expressions of politeness, i.e. "Could you tell me if", and words spoken differently, e.g. "tuh-rem-ble". Mood is added, i.e. a sequence of (indicative, interrogative, indicative). Probability modality is added, i.e. "must have", "probably". We could have added other modality, such as obligation, permission, or ability. We've added a tag, i.e. "didn't it?". We could have added polarity in the main predicate. What we can't indicate in this written encoding of speech is the attitudinal intonation overlaid onto each clause, of which English has hundreds. Neither can we show the body language, also part of the interactional function of speech.
Natural language in the human brain
A recent article in Scientific American says biologists now believe the specialization of the human brain’s two cerebral hemispheres was already in place when vertebrates arose 500 million years ago, and that "the left hemisphere originally seems to have focused in general on controlling well-established patterns of behavior; the right specialized in detecting and responding to unexpected stimuli. Both speech and right-handedness may have evolved from a specialization for the control of routine behavior. Face recognition and the processing of spatial relations may trace their heritage to a need to sense predators quickly."
I suspect the representational function of language is that which is produced by the left hemisphere of the brain, and the interactional function by the right hemisphere. Because the right side of the brain is responsible for unexpected stimuli, from both friend and foe, then perhaps interactional language in vertebrates began as body language and facial expressions to denote conditions relevant to others, e.g. anger, fear, affection, humidity, rain, danger, etc. Later, vocal sounds arose as the voice box developed in various species, and in humans, increasingly complex sounds became possible. The left side of the brain is responsible for dealing with regular behavior, and so allowed people to use their right hand to make sign language to communicate. Chimpanzees and gorillas use their right hands to communicate with each other, often in gestures that also incorporate the head and mouth. The article hypothesizes that the evolution of the syllable in humans triggered the ability to form sentences describing processes involving people, things, places, times, etc. Proto-representational language was probably a series of one-syllable sounds similar to what some chimps can do nowadays with sign language, e.g. "Cat eat son night". Later, these two separate functions of natural language intertwined onto human speech.
Programming language structure
When looking at programming languages, we can see the representational function easily. It maps closely to that for natural languages. The process is like a function, and the actor, goal, recipient, and other entities in the transitive structure of natural language are like the function parameters. In the object-oriented paradigm, one entity, the actor, is like the object. The circumstances are the surrounding static scope, and the relationships between processes is the sequencing of statements. Of course, the semantic domains of natural and programming languages are different: natural languages talk about a wider variety of things, themselves more vague, than programming languages. But the encoding systems are similar: the functional and object-oriented paradigms became popular for programming because between them it's easy for programmers to code about certain aspects of things they use natural language to talk about. The example in pseudocode:
Date("2003-8-29").events += {
def a = new Instances();
a[1] = jamesStrachan.officiallyAnnounce(Language.GROOVY);
a[1].effect = [some: s => s.rejoice(), others: o => o.tremble];
}The similarities between the interactional functions of natural and programming languages is more difficult to comprehend. The major complication is the extra participants in programming languages. In natural language, one person speaks, maybe one, maybe more people listen, perhaps immediately, perhaps later. Occasionally it's intended someone overhears. In programming languages, one person writes. The computer reads, but good programming practice is that other human people read the code later. Commenting, use of whitespace, and variable naming partly enable this interactional function. So does including test scripts with code. Java/C#-style exception-handling enables programmer-to-programmer interaction similar to the probability-modality of English verbal phrases, e.g. will/definitely, should/probably, might/could/possibly, won't, probably won't.
Many programming systems allow some interactional code to be separated from the representational code. One way is using system-wide aspects. A security aspect will control the pathway between various humans and different functions of the program while it's running. Aspects can control communication between the running program and different facets of the computer equipment, e.g. a logging aspect comes between the program and recording medium, a persistence aspect between the program and some storage mechanism, an execution performance aspect between the program and CPU, a concurrency aspect between the program and many CPU's, a distribution aspect between the program and another executing somewhere else. Here, we are considering these differents facets of the computer equipment to be participants in the communication, just like the programmer. Aspects can also split out code for I/O actions and the program entry point, which are program-to-human interactions. This can also be done by monads in "pure functional" languages like Haskell. Representational function in Haskell is always kept separate from interactional functions like I/O and program entry, with monads enabling the intertwining between them. Monads also control all access between the program and modifiable state in the computer, another example of an interactional function.
Textual function of language
The textual function of language in SFG is that which concerns the language medium itself. In spoken natural language, this is primarily the sequential nature of voice, and in written language, the 2-D form of the page. Whereas in natural language theory, the voice-carrying atmosphere and the ink-carrying paper are obviously mediums and not participants, it's more difficult to categorize the difference between them in programming language theory. Because a program is written as much for the CPU as for other human readers, if not more so, we could call the CPU a participant. But then why can't the CPU cache, computer memory, hard-disk storage, and comms lines also be called participants? Perhaps the participants and the transmission medium for natural languages are also more similar than different.
The textual function of language is made up of the thematic, informational, and cohesive structures. Although mainly medium-oriented, they also involve the participants. The thematic structure is speaker-oriented, the informational structure is listener-oriented. The thematic structure is overlaid onto the clause. In English, what the speaker regards as the heading to what they're saying, the theme, is put in first position. Not only clauses, but also sentences, speech acts, written paragraphs, spoken discourses, and even entire novels have themes. Some examples using lexical items James, to give, programmers, Groovy, and 2003, with theme in italics:
James Strachan gave programmers Groovy in 2003. Programmers are who James gave Groovy to in 2003. The Groovy Language is what James gave programmers in 2003. 2003 is when James gave programmers Groovy. Given was Groovy by James to programmers in 2003. In English, the Actor of the representational function's transitive structure is most likely to be separated from the interactional function's Subject and from the Theme in a clause, than those from each other. I think the textual functions of natural language are far more closely linked to the interactional function than to the representational. Perhaps the right side of the brain also processes for such texture structure.
The informational structure jumps from the top (i.e. semantic) encoding level directly to the bottom (i.e. phonological) one in English, skipping the middle (i.e. lexical/syntactic) level. This is mirrored by how programming languages such as Python use the lexical tokens to directly determine semantic meaning. In English, the speech is broken into tone units, separated by short pauses. Each tone unit has the stress on some part of it to indicate the new information. For example, each of these sentences has a different informational meaning (the bold indicates the stresses):
James gave programmers Groovy in 2003. James gave programmers the Groovy Language in 2003. James gave programmers Groovy in 2003. James gave programmers Groovy in 2003. James Strachan gave programmers Groovy in 2003. Unlike the thematic structure, the informational structures the tone unit by relating it to what has gone before, reflecting what the speaker assumes is the status of the information in the mind of the listener. The informational structure usually uses the same structure used in the thematic, but needn't. English grammar allows the lexical items to be arranged in any order to enable them to be broken up in any combination into tone units. For example, these examples restructure the clause so it can be divided into two tone units (shown by the comma), each with its own stress, so two items of new information can be introduced in one clause:
James gave Groovy to programmers, in 2003. As for Groovy, James gave it to programmers in 2003. In 2003, James gave programmers Groovy. Programming languages should follow the example of natural languages, and allow developers to structure their code to show both thematic and informational structure. The final textual function, the cohesive structure enables links between clauses, using various techniques, such as reference, pronouns, and conjunctions. Imperative programming languages rely heavily on reference, i.e. temporary variables, but don't use pronouns very much. Programming languages should also provide developers with many pronouns.
Summary
Programming languages initially represented information in the same way humans do, using transitive structures such as function calls, joined by logical relationships such as blocks and class definitions. Interactional aspects of code were initially intertwined, but could be separated out using aspects and monads. Enabling different textual structures in programs isn't very widespread, so far limited to providing different views of an AST in an IDE, only occasionally allowing "more than one way to do things" at the lexical level. When used well, textual structures in code enable someone later on to more easily read and understand the program.
In promoting the benefits of programming languages enabling different textual structures, I think it's useful to narrow down to two primary structures: the transitive and the thematic, as these two are easiest to communicate to programmers. See my earlier thoughts on how a programming language can enable more thematic variation. Programming languages of the future should provide the same functions for programmers that natural languages provide for humans.
And of course, I'm building Groovy 2.0, which will both enable thematic variation in the language syntax/morphology, and supply a vast vocabulary of Unicode tokens for names. The first iteraction will use Groovy 1.x's SwingBuilder and ASTBuilder, along with my own Scala-based combinator parsers, to turn Groovy 2.0 source into Groovy 1.x bytecode. The accompanying Strach IME will enable programmers to enter the Unicode tokens intuitively. Groovy 2.0 will break the chains of the the Antlr/Eclipse syntactic bottleneck over Groovy 1.x !!!
1 January 2010Bust Groovy open. Set it free!Ever since the core dynamicity and syntactic enhancements of Groovy 1.0 beta 1 over Java, the Groovy Language has been adding functionality upon functionality. The Groovy developers at Codehaus have taken one core technology that benefits Java developers, i.e. the meta-object protocol, and used it as a hook to hang on a closetful of their own versions of widely available JVM-based technologies. Groovy 2.0 at Codeplex will be the version of Groovy that strips away such functionality, aiming to provide developers with the features of Groovy that benefit us the most. The bundled tools will be dropped, as they duplicate functionality available in other JVM languages. The Antlr-based lexer/parser will be removed, so we can interact with the AST directly from other languages. The DGM (default Groovy methods) will be stripped out, so Groovy AST users can instead use the richer classes from languages like Scala.
Lately I've discovered the Groovy AST is inconsistent in its functionality: while earlier-coded syntactic functionality is done above the AST level, much of the more recently coded functionality that could be done above the AST is instead done under its hood. In this way, as well as by supplying an ASTBuilder that can only be used from within the Groovy Language itself, the Codehaus cartel are tying programmers in to all their own added cruft, so they can sell us the book and charge us consulting fees down the line. I'll dig under the AST and scalpel out any function that can be done in other languages. When Java 7 brings closures, I'll totally replace Groovy's implementation with Java's.
During 2010, I intend to free the Groovy Language from its Codehaus chains, to bust it open, to reveal its core essence, the kernel that most benefits, so programmers can use it simply from other superior JVM-based languages, so my own Strach IME and lexer/parser can use it, providing developers with a terse grammar that uses all Unicode tokens in its vocabulary. Beginning with Groovy 1.8 beta 1, I'll soon after release a stripped-down version consisting only of the core essence, an AST directly controlling the MOP, to provide the JVM's answer to Microsoft's DLR, an AST all JVM language implementers can build a dynamic language on top of. I'll develop a process so with each Groovy release, I can quickly release that release's MOP as a standalone. The Groovy Language Runtime will move from "open" source to open source. Set Groovy free!
23 December 2009"Groovy 2010" coming ???Groovy 1.7 is out, "in time for Christmas", and planning for v 1.8 has begun, including a new module system.
Groovy 1.x project leader Guillaume Laforge says "we would like to make a first beta of 1.8 in February or so, with a target final date for the end of the year - we love Christmas gifts". Does that mean he's going to rename it "Groovy 2010"? He once threatened to rebrand Groovy 1.6 as "GroovyX". Microsoft once tried that trick with Windows, but have since reverted to numeric versioning. Let's hope the Codehaus developers don't learn that lesson the hard way.
The new module system proposes putting Swing, XML, SQL, JMX, Beans, etc into separate modules, but the core will still be a tangled ball of many functionally different components. Everything that sits above the AST could be separated out, to encourage developers to put their own syntax on top of the AST. I'm experimenting with an alternative lexer/parser, called the "Strach" component of Groovy 2.0, aiming for greater tersity, yet retaining clarity. Experimentation is good for programming languages.
Another separable component is the Default Groovy Methods (DGM). These methods are compulsory in Groovy: if you want to use Groovy's meta-object protocol (MOP), you must also use these methods. The meta-object protocol allows programmers to add, and subsequently remove, methods on the fly, but forces these default methods on us. What if we just want to use the MOP, without the DGM? I'll be providing a component, to be called the "Wilson" component, that gives the option not to add those methods to classes, and will even let us hide default Java methods.
Programmers aren't silly: Why can't we use the feature of the Groovy Language that really benefits us, i.e. the MOP, without having other cruft shoved on us as well, such as the syntax and DGM? Groovy 2.0 will sit atop Groovy 1.7, giving more choices to programmers.
AppendixHere's the list of AST nodes used in Groovy 1.7, with indenting showing implementation inheritance, that Strach will free up for developers to use directly:
ASTNode
AnnotatedNode
ClassNode
InnerClassNode
InterfaceHelperClassNode
MixinNode
MethodNode
ConstructorNode
FieldNode
ImportNode
PackageNode
Parameter
PropertyNode
expr/Expression
expr/ConstantExpression
expr/AnnotationConstantExpression
expr/BinaryExpression
expr/DeclarationExpression
expr/TernaryExpression
expr/ElvisOperatorExpression
expr/BooleanExpression
expr/NotExpression
expr/TupleExpression
expr/ArgumentListExpression
expr/PropertyExpression
expr/AttributeExpression
expr/ListExpression
expr/ClosureListExpression
expr/MapExpression
expr/NamedArgumentListExpression
expr/ArrayExpression
expr/BitwiseNegationExpression
expr/CastExpression
expr/ClassExpression
expr/ClosureExpression
expr/ConstructorCallExpression
expr/EmptyExpression
expr/FieldExpression
expr/GStringExpression
expr/MapEntryExpression
expr/MethodCallExpression
expr/MethodPointerExpression
expr/PostfixExpression
expr/PrefixExpression
expr/RangeExpression
expr/RegexExpression
expr/SpreadExpression
expr/SpreadMapExpression
expr/StaticMethodCallExpression
expr/UnaryMinusExpression
expr/UnaryPlusExpression
expr/VariableExpression
stmt/Statement
stmt/AssertStatement
stmt/BlockStatement
stmt/BreakStatement
stmt/CaseStatement
stmt/CatchStatement
stmt/ContinueStatement
stmt/DoWhileStatement
stmt/EmptyStatement
stmt/ExpressionStatement
stmt/ForStatement
stmt/IfStatement
stmt/ReturnStatement
stmt/SwitchStatement
stmt/SynchronizedStatement
stmt/ThrowStatement
stmt/TryCatchStatement
stmt/WhileStatement
ModuleNode
GenericsType
AnnotationNode15 December 2009Try Groovy, or is it try{Groovy}catch(Exception e){} ???Run this code in Groovy 1.6 beta 2 or earlier:
try{def a= "abc"; println a}
try{def a= 123; println a+2}The result:
abc
125Now run it in Groovy 1.6 RC 2. The result:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: A try block must have at least one try or finally block.A standalone try block is great for limiting the scope of common temporary variables. Groovy enabled them in versions 1.0 and 1.5, a great improvement over Java. Scala also enables such standalone try blocks, even in the upcoming Scala 2.8.
But in this mailing list reply to me, Guillaume Laforge writes the prefered way is to use labelled blocks, i.e.
unreferencedUselessLabel: {def a= "abc"; println a}
unreferencedUselessLabel: {def a= 123; println a+2}What yukky syntax! Perhaps a better change would have been to increase syntactic elegance and tersity by eliminating parens when only one statement is in the block, just like with if and while statements:
try println "abc"
try println 123 + 5Not only that, but in someone's eagerness to restrict programmer choices, they didn't bother checking the error message: "A try block must have at least one try or finally block." (To be fair, the message has since been corrected for Groovy 1.7.)
As the Groovy Language Underwriter, my job is to be ready to continue Groovy Language development should the developers at Codehaus abandon it or change its name. However, they actually seem to be gradually removing Groovy developers' choices through stealth, dumbing down the syntax, especially now SpringSource and EMC/VMware are bankrolling Groovy Language development, or perhaps bankrolling the lack of it.
Part of my mission in creating an alternative lexer/parser for the Groovy AST is to bring back programmer choices when utilizing the Groovy AST. To quote another freedom fighter, removing programmer power from the Groovy Language syntax is something up with I will not put.
4 December 2009Groovy 2.0 status reportI grew up in Auckland New Zealand, living there for 30 years, but never considered creating a scripting language until I'd moved to Melbourne Australia 10 years ago. There, I lived in the CBD, an 8-by-8 grid of blocks known as the Hoddle Grid. I went for many walks while there, often thinking of that CBD as a huge chessboard on which to play out games. (I had a passing interest in chess when I was a kid.) The apartment I lived in was on a popular block of the grid, being where the commercial, entertainment, and recreational precincts meet. In the chessboard analogy, that block is the starting position for the white king.
There, I ran a company, GroverServer Ltd, dayjobbing as a programmer to raise funds, while working on a process to model company annual reports as Access databases. Although I'd used Access many times since its v1.0 release, when using it for a real-world complex business task, it wasn't flexible enough. I eventually concluded the VBA scripting language was nothing but a marketing con. The true genesis of Groovy happened soon after this, during a trip I made to England in Feb 2002. I decided a scripting language should run on a VM, Java's being the leading one at the time. It should have a flexible syntax, certainly not like VB's, and must enable AspectJ-style interceptions and introductions. Of course, I never met Groovy Language creator James Strachan while there. He began building Groovy 1.x soon after this, yet another of his many open source projects, recruiting the developers who now currently control it.
I stayed silent on the mailing list for a year, not wanting them to change the name while they still could easily. But then came that infamous day after DevCon2 in Dec 2005 when James left the development team. Soon after, I posted my very first posting to the Groovy Language mailing list. Six weeks later, Graeme Rocher changed the name of _Groovy on Rails_ to Grails, but it was too late for them to change Groovy's groovy name as well. By then I had learned enough about Groovy to continue its development should the developers abandon it or change its name: I had become Groovy's underwriter.
I decided to get more involved in the language, first by submitting bug reports and change requests. My very first request was for a groupBy method I'd found useful for munging data. Guillaume Laforge must have also thought it was a good method to have, because he created his own similar request a few months later with the same method name. After many similar happenings, I began to realize the Codehaus Groovy developers didn't want me around.
So I decided to branch off on my own, using the Groovy Language AST as an engine to power a different configurable programming language syntax and IME. While working on this, I discovered why statically-typed languages are better than dynamically-typed ones for large systems. So I switched to programming in Scala. At first, I thought I could use the Scala AST as a target instead of Groovy's, but I now realise dynamicity is essential in those few use cases that require it, so I'm back to targeting the Groovy AST, using the Groovy 1.7 ASTBuilder, created by Hamlet D'Arcy. For the static mode, I'll still have to target the Scala AST, though the ideal solution is if Groovy added a static mode. However, developer Alex Tkachman seems to be vetoed by project manager Guillaume Laforge on this. Years ago, the Codehaus Roadmap for Groovy 3.0 was for it to be written in Groovy, which would have required a static mode, but this idea seems to have been trashed.
I'll still be using Scala as the systems language in building Groovy 2.0. The Codeplex Groovy site will initially distribute "Groovy 1.7 with Strach", where Strach is presently just the lexer/parser, written in Scala, targetting the Groovy 1.7 AST. Eventually, it'll also target the Scala 2.8 parse tree for the static mode (unless, of course, Alex Tkachman succeeds in putting a statically-typed mode into Groovy). When distributing Groovy 1.7 with Strach, I'll experiment with replacing selected java-source classfiles from the Groovy 1.7 jar file with my own scala-source ones. Given enough time, I could even manage to totally rewrite Groovy in Scala using this process!
31 October 2009Strach IME and Groovy 2.0 making progressThe webpage for the Strach IME has been created. The Groovy page at Codeplex has been repurposed as the primary distro site for the Groovy Language 2.0 next year. Further details on each of those webpages.
27 September 2009Scala's groovy stairwayPaul Graham writes in Revenge of the Nerds how Lisp and Fortran are the trunks of two separate evolutionary trees in programming language evolution. He then lists various features of Lisp which have been making their way into languages in the Fortran language tree, including dynamic typing. Having tried out many programming languages over the past few years, I now see programming language evolution differently. Unlike Paul, I see dynamic typing as being a lack of a feature: Static typing is the true feature.
Furthermore, Lisp macros can be thought of as a low-level feature comparable to goto statements and pointers. All three can be abstracted over with higher-level abstractions. Let's look at some abstracted-away low-level features...
Gotos and breaks The original programming language was of course assembly language. Assembly had the same basic features as machine code, only a little more readable. We could branch to another part of the code based on a data value: we could use this to implement conditionals and looping. Algol enabled statements to be grouped into statically-typed blocks. With all these, we could eliminate goto statements. We could also store the program counter in a variable, branch to some different code, then later return to the place we left off: this is subroutine calling. Cobol implements this as a "GOSUB" statement. Fortran enabled subroutine parameters; Algol enabled return values; C and Lisp brought recursively-called subroutines; and Scheme brought closures. C++ and Java implemented exception throwing, giving better control flow. Scala, also having closures and exception-throwing, eliminates break and continue keywords, being incompatible with passed-around closured code, and in version 2.8, re-implements break with exceptions.
Pointers and objects Cobol brought static typing. C enabled static typing for pointed-at data. Simula and Smalltalk introduced objects. Different inheritance models were tried out: C++ used multiple inheritance, Java and C# used single implementation inheritance, Self and JavaScript used the prototype model, while Ruby and Scala used the flexible mixin model. Ruby also has open classes, at the cost of eliminating static typing. Lisp, Ruby, and Java/C# had garbage collection. By using objects everywhere, a language no longer needs pointers.
Human interface 3rd generation languages enabled more meaningful names, making code more readable, but longer. Fortran, C/C++, and Java/C# brought operator precedences, eliminating parentheses, thus shortening the code again. Scala simplified the rules for this. In Lisp and Scala, statements are also expressions, returning a value. Interactive Python's magic underscore is a simple way to pass a statement value onwards. APL, and successors J and K, brought greater tersity through a greater vocabulary of tokens. Matlab and R continue along this way for math and stats. Perl enabled thematic variation, bringing "more than one way to do it". Smalltalk was programmed in a built-in visual environment, as was spreadsheets, etc, and IDE's, all using color. Declarative paradigms, like Snobol, regexes, and Prolog make a program more readable. Indenting was used by Cobol and extended by Python. Haskell offers the choice of C-style or indent-style bracketing.
Efficiency and concurrency Pure Lisp is very inefficient, but nowadays different data structures are builtin. Numbers were always hard-coded, direct-access arrays added later, and with Scala, even objects that inherit are builtin types. Java brought threads for concurrency, while Erlang and Scala brought the safer higher-level actor model.
Macros and laziness Lisp enabled macros to control evaluation in code. Scheme enabled lazy evaluation and Haskell made it compulsory, eliminating much need for macros. Scala gives the choice of strict or lazy evaluation, in a statically-typed language. Better compilers can automatically detect and inline code that would normally require programmer-control with macros. AspectJ-style aspects and Haskell-style monads also allow code to be self-referenced and manipulated in a program.
There's different tradeoffs between these feature sets, and creating a programming language that combines them is difficult. ML, and successors Haskell, Caml, and F# achieved this when combining static typing with functional programming. OCaml and Scala successfullly combined the object-oriented programming with functional.
IDE's generally build on the lexical structure of a programming language. The Scala language compiler is designed as a stairway of increasingly-higher abstractions. Near the top is the parse tree, one step short of the lexical structure. I'm attempting to build on top of this parse-tree layer of Scala 2.8. I want to add APL/J/K-style tersity to the syntax, including enabling me to use a foreign language (simplified Chinese) everywhere in my Scala code. When done, I'll release this language as "Groovy 2.0".
9 September 2009, 9:09:09pmGroovy life and deathRick Dillon recently posted this analysis of programming language evolution. The traditional imperative languages like C/C++, and the newer ones like Java and C# are statically typed, while the traditional functional languages like Lisp/Scheme, and the newer semi-functional ones like Python, Ruby, and Javascript are dynamically typed. He gives a code sample implementing functional currying in statically-typed Java, which turns out to be quite verbose because of the explicit static types. To put static typing into functional programming, while retaining tersity, we require type inference. ML/Caml and Haskell are examples of such languages, and OCaml/F# and Scala are object-oriented language examples. When seen in this way, dynamic languages are a deadend in programming language evolution. Instead of maintaining the "systems language / scripting language" duo, future language evolution will go along the "functional language with inferred static typing" route.
I suspect many programmers coming to dynamic languages will follow the same path I did in realizing this. They will typically work in Java, C#, Cobol, PHP, and/or VB in their dayjobs. They'll discover Python or Ruby, though for me it was Groovy with the nifty closures and collections. At first, they'll just use it for scripty stuff, then start trying to build bigger and bigger systems. They'll then realise the lack of static typing means they've thrown out the baby with the bathwater. The functional languages with inferred typing will then beckon. Groovy programmers will start learning Scala because it runs on the JVM. At first they'll think that Scala will only replace Java, so they can use Groovy and Scala together, but eventually they'll see that statically-typed functional languages can replace both members of the "systems language / scripting language" duo! (Perhaps some will even say, as I did, that using the Groovy Language started off being useful, but "what began as life to me has now become death"!)
Lately, I've been trying to understand the interplay between different features in these types of languages, such as monads, macros, and mixins:
(1) Monads from Haskell enable computer languages to cleanly split program code into functional-pure and side-effecting components. Aspects, as in AspectJ and Spring, are frequently used in a system-wide manner in non-functional-paradigm languages to separate out certain non-paradigm concerns such as I/O, persistence, exception-handling, optimization, etc, from the primary representational concern. This type of separation between the representational and interactional functions of a programming language mirrors that in natural language, as analyzed in Hallidayan Systemic Functional Grammar theory.
(2) Syntactic macros can provide the most user-configurability at the surface levels, as in Lisp/Scheme and Dylan. Most programming languages provide much power in the engine, then deliberately bottleneck it for the language syntax, only to return it to the programmer at the IDE level. Natural languages don't do this, and I don't think computer languages should, but a programming language syntax is considered a holy grail for marketing the language, so not many languages have dared to allow such syntactic configurability in the past. Perhaps this syntactic component of programming languages mirrors the textual component of natural language in Systemic Functional Grammar.
(3) Scala traits (i.e. mixins) provide a more flexible yet still correct OOP system than either single or multiple inheritance. The Scala website shows how they can be used to cleanly implement the Observer pattern, the very pattern the AspectJ evangelists 10 yrs ago were saying aspects could easily implement in the non-functional language Java.
No single statically-typed functional language provides all these features, not that I yet understand them all, and how they relate to each other. I do intend to return to creating a shell over the Scala language parse tree once Scala 2.8 is out because I think this is the best opportunity to evangelize full Unicode character set programming to the world. The shell will be called GroovyScala.
21 August 2009A Groovy UndertakingThis mailing list reply from Jochen Theodorou popped up on the Reddit-programming charts recently, probably stage-managed damage-control. Jochen wrote: "James (Strachan) is great in initiating projects and gets them to a state where the examples work. But as soon as you go away from the examples and alter them just a tiny bit, it fails." Yeah, that's called Test-driven development. The solution: add more tests, then make them work! I'm a great believer in it. Jochen also wrote: "...as an active part (James dumped Groovy) over four years ago already. (...) You can say that current Groovy is Guillaume (Laforge) and me mostly, but many people did come and go, some did contribute a lot... like for example John Wilson, other did only cover a small area." The Groovy developers seem to be positioning James as only one of many Groovy Language "creators". What's up? Is someone else hoping to stand in for language creator James in an upcoming Groovy Language interview in Australian Computerworld?
Why did some developers "only cover a small area"? Perhaps they started getting harrassed after surfacing on the Groovy mailing list, as I did 3 yrs ago? At the time I thought it was just my name! I knew anyone could've been doing it, but around that time two UK teachers at my university took me to dinner and warned me that "anyone who takes on Google comes off second best". I doubt it was Google who put them up to that, and why would the Groovy FUD-spreaders do so? I suspect the real reason the present developers took control of Groovy was to try and sell it to Google as a brand name fit. But I didn't really understand how I was a threat to them. Early the following year, when Groovy 1.0 was finally released, the licence still clearly said: 4. Products derived from this Software may not be called "groovy" nor may "groovy" appear in their names without prior written permission of The Codehaus. "groovy" is a registered trademark of The Codehaus. I was just piggybacking the Groovy name because I thought it might be a good gimmick if I wanted to return to programming work one day, not for any other reason.
Then one day I was fooling around online, and looked up the U.S. Trademark database for the Groovy Language details. They weren't there! They weren't even in the history of lapsed trademarks. Codehaus was a US-based outfit, weren't they? Beta-1 of Groovy 1.1 was then released with the Apache licence. Groovy's previous licence had only been a bluff! But was that really a reason to harrass me? Other programming languages don't trademark their names: I suspect if I changed my name to Scalia Scalow and surfaced on the Scala mailing list, no-one there would feel insecure enough to harass me because of my name. It seems there's an essential difference between languages: Scala is a quality language designed within academia, though intended for business, to bring present Java and C# developers a little closer to functional programming. Groovy is an adhoc commercial creation, designed to flip the investing companies at a profit, first Bay Partners, then SpringSource, and now VMware. I suspect it's that difference that makes the Groovy Language developers ultra-picky about who's involved in the development.
Jochen also wrote: "Developing a language is a lot of stress. You have to discuss things on an emotional level very often. (...) And many people get tired of these discussions, so did James and so did for example John." Is he priming up the community for the next departure, perhaps himself or Guillaume? Because I imagine VMware had more cash in their "cash and stock" offer for SpringSource than did SpringSource in theirs for G2One, perhaps Guillaume's suddenly lost some motivation to continue with (J)Groovy development. And what about me? After 4 years of working on an idea to make programming languages terser using all Unicode tokens, staying in mainland China because it's the home of the simplified Chinese characters and targeting the Groovy AST because of its adhoc construction and its groovy name, I confess I'm also getting a little tired of it all. I never really knew when I first got involved what a truly dirty business open source software development is.
8 August 2009Groovy futuresThe (J)Groovy developers recently released beta-1 of "Groovy 1.7". But will it really be called version 1.7 ? The developers changed the name of Groovy 1.1 to 1.5 at the last moment, and they might do it again with version 1.7. Besides plucking some stuff out of Spock and ASM, they've begun on inner classes, the main feature from Java still missing from (J)Groovy. And they've dusted off the GroovyScriptEngine, rewriting it, probably as a snipe at my own GroovyScript-branded version of Groovy for the Scala parse tree. I'm not sure how many developer hours SpringSource threw at beta-1, but I suspect not many. They need to keep up the appearance of developing the Groovy Language, while continuing to collect consulting fees, to get a high valuation in their talks with JBoss or whoever it is.
While "Groovy 2.0" has been talked about as the version of (J)Groovy shipping with a new improved MOP, this current 1.7 line might end up with that name, not because it has a new MOP but for marketing reasons only. Perhaps they'll bring out new editions of their books. The Groovy Language release schedule now seems to completely revolve around marketing and training events. But where would the new MOP fit in? If the developers finally manage to do what John Wilson couldn't, what Groovy 1.x botched, in producing a Java-language compatible MOP for the JVM, would SpringSource really want to prewrap it in a programming language and tag it with the "Groovy" brand? I'd think they'd want to pitch it as the JVM's answer to Microsoft's DLR, something like "the Spring DLR for the JVM", and promote it for all JVM-based dynamic languages.
But even without a new MOP, doesn't the Groovy Language 1.x still have a future? It's certainly the language of choice for Grails. For other use cases, such as scripting and testing, it may now be superceded. Before Groovy, developers used JPython. Some pitched Groovy as a better choice because it's Java-syntax compatible, but I don't think they really understood the mindset of a typical corporate programmer. Programmers want to expand their skillsets, so would rather choose JRuby for scripting and testing because it's NOT Java-syntax compatible. JRuby is curriculum vitae compatible for Java developers, being another step up to a Rails job. And what of (J)Groovy's recent push with Griffon? With Grails there was little serious competition for Groovy, but with Griffon, Groovy is up against the might of JavaFX. The recent trend of calling (J)Groovy "Groovy on Grails" may have hit the mark.
As the Groovy Language underwriter, I often think about Groovy's future, both the technology and the brand, and I'm now a little pessimistic. A week after I said I was switching from C# to Scala/JVM for programming to the Groovy Language AST, Groovy Language creator James Strachan (by another total coincidence :-) bought and read the Programming Scala book, and subsequently said he thought Scala was a better choice than Groovy for systems programming. After programming in Scala for a mere month, and that only part-time, I realized Scala is already the language I was trying to modify the (J)Groovy AST to be. There are still a few things Scala lacks, such as syntactic macros, but I've no doubt they'll be coming in an upcoming version of Scala. What programming I did do in Scala (i.e. build a combinator parsing library), I later discovered a better version already existed in the Scala libraries. Recently I started to doubt why Codehaus (J)Groovy/JVM and Codeplex Groovy/DLR exist when I could target both platforms via the Scala parse tree, but now I'm wondering why I'm programming at all?
With (J)Groovy being too minimal a wrapper for an upcoming "SpringSource DLR for the JVM", with corporate developers prefering JRuby to Groovy for scripty stuff, with Swing already being targeted effectively by JavaFX, and with Scala becoming recognized as the best choice for new systems programming for the JVM, the only role for Groovy in the foreseeable future seems to be as Groovy 1.x for Grails. People will probably just call it "the Grails language". Perhaps the only future for the Groovy brand is as my middle name.
28 July 2009One Groovy Language to rule them allSome say that because programmers read code far more than they write it, it's better for a language to have a clean minimal syntax, so we can easily read code others have written. But natural languages don't work that way.
Whatever our native language, we can read many more words than we use when we write, and we can understand in listening many more words than we usually speak. As we learn our native tongue as children, we hear many varieties of it and much vocabulary, learn to understand it quickly, but we seldom reproduce most of it. Even as adults, it doesn't take long when listening to a new flavor and accent of English to understand it, but we take much longer to reliably imitate it, if at all.
I once spent a couple years studying natural language, then returned to programming as a hobby. I'd never liked Perl: the "there's more than one way to do it" philosophy had never appealed; I'd preferred the more minimal syntax of Python or Smalltalk. But when I returned to programming, what Larry Wall's been saying all these years began to make sense. Programming language designers who restrict what the language can do, providing only one way to do things, are like the grammar school English teachers who try to prescribe to their students what correct English is.
Mainframe programmers read lots of Cobol programs to understand their meaning, but don't write much of it when maintaining programs. An experienced Cobol programmer can flick through a printout and quickly understand the program. Computer Science students read the C code in the Unix kernel, but seldom change it. They read it so they can read C code easily. We should be able to understand code written by others, not by contraining what others can write, but by more experience in reading what others have written.
But one programmer can only really read code easily in one or two computer languages, just as most people can only learn one or two natural languages really well. For this reason, programmers are categorized by the language they program in.
The Groovy Language will solve this problem by being available for every available AST. As well as the (J)Groovy flavor for the Groovy/JVM AST, the GroovyScript flavor for the Scala parse tree, and the Groovy-DLR flavor for Microsoft's DLR, the Groovy Language will eventually be available for every AST platform. The original (J)Groovy syntax was a close copy of Java's, while being semantically different; Java's was a close copy of C++'s, also semantically different. So someone who knew Java had a head start learning Groovy, and so on.
The Groovy Language will be the end-of-the-line for the C-syntax, available for every practical AST, and so replacing other programmng language syntaxes. Therefore, if someone learns (J)Groovy, they can then switch to using Groovy-DLR easily, just as when someone learns British English, they can switch to using Indian English easily. There will be one Groovy Language to rule other computer language syntaxes. I guess they'll eventually become obsolete.
27 July 2009What makes Scala groovy?Lately, I've been thinking about what makes Scala groovier than (J)Groovy...
(1) Nested classes and packages. I can nest my class definitions any way I want when doodling, i.e. doing experimental programming. Groovy DevCon 5 talked about nested classes for Groovy 1.7. Anonymous inner classes aren't necessary, though, as closures can simulate them.
(2) Pattern matching. Pattern matching is an incremental addition to a programming language that, once learnt, is hard to do without. Groovy 2.0 is slated to bring pattern matching, but some think the coming Groovy 2.0 is a myth, just like JSR 241 and the Groovy language spec.
(3) Combinator parsing. People are bored with the limits of regexes, and want more declarative power in parsing stuff. Scala now has a terse combinator parsing syntax, and Scala 2.8 will introduce the more efficient packrat parsing trait for them.
(4) Consistency of syntax and semantics, e.g. the method/field uniform access principle, as opposed to the tack-on approach of (J)Groovy, which is necessary to ensure seemless backwards-compatibility with Java classes. Returning to Groovy coding after working with Scala's "clean break with Java" design, though, is harder than returning to Java coding after working with Groovy. Scala's operator/method and parameter/indexing dualities are features that could successfully be put into Groovy, though.
(5) Mixins/traits. The (J)Groovy 1.1 (betas) AST had empty stubs for mixins, but the Groovy developers never implemented them. After seeing how Scala traits could do things I thought were only elegant with aspects, e.g. the Observer pattern, I now believe Groovy needs those mixins.
(6) Inferred static typing. After working with this in an IDE, one wonders how the "more tests are better than more typing" lie spread so rapidly. Inferred static typing is "more typing (static) with less typing (fingers on keyboard)".
What (J)Groovy features are groovier than Scala's?(1) Builders. I've heard Ruby copied this feature from Groovy. Has it been done in Scala? Scala's syntax already allows it, though I've yet to see a Scala implementation of Groovy's HtmlBuilder. Scala's inline XML syntax is ugly compared to builder-based syntax.
(2) GStrings. Also known as interpolated strings, they enable us to do much commonly used string handling, e.g. printing, with a terser syntax. Perhaps Scala's scalable syntax could enable these without syntax changes, I don't know.
(3) Dynamic typing. Dynamic typing is useful in the 20% of code where static typing isn't suitable. Just as dynamic Python enhances static C code, dynamic Groovy enhances static Java code. Static languages can emulate some dynamic typing features by typing everything with the Object type, or using an expando object. Open classes that enable inheritance may be impossible to emulate, though. (Groovy also enables built-in static typing which is slower than its dynamic typing. Use this feature for interface documentation only, use Java instead for other static typing requirements.)
It seems Scala could copy (J)Groovy's groovy features far easier than Groovy could copy Scala's. Although Groovy pitches itself as "complementing, not competing with" Scala, since programming in Scala, I've yet to find much that Groovy's a more obvious fit for. Scala's tersity and inferred typing are addictive.
What could make both Scala and (J)Groovy groovier?Self-mutating syntax. This would enable syntactic macros and keyword aliasing, thus putting Scala into the realm of Lisp/Scheme. Some of Scala's syntax looks like it could be redefined as a syntactic macro, e.g. the for comprehension could generate the underlying calls to map, filter, etc. After pulling out these types of simplifications, perhaps Scala's remaining syntax would be easily handled by a library based on Scala's own parser combinators, making Scala syntax self-referential.
The GroovyScript source code I've posted enables annotations to define lexical and syntactic features of a C-syntax language such as (J)Groovy or Scala, perhaps another way of making it self-referential. It requires using monadic bind and return/value parser combinators, making it a "context-sensitive" grammar. Packrat parsing can do context-free parsing in linear time, though with the cost of memory space: can multicores keep such context-sensitive parsing tractable as well?
22 July 2009Scala eclipses (J)GroovyI've posted beta-2 of GroovyScript. GStrings are now parsing. The parser, written in Scala, uses a pushback lexer so lexical tokens can be defined in the parsed syntax using annotations. What's there may be useful for someone to see how a parser with a pushback lexer can work. However, I'm now looking at whether I can rewrite it as an extension to Scala's built-in combinator parser library, so don't expect anything more for a while.
The more I program in Scala, the more convinced I become that it's the grooviest Groovy Language of all. Scala's lexing and syntax needs to be more customizable, though, which is what GroovyScript's all about, adding an alternative lexer/parser to the Scala parse tree, to make the syntax self-referential, thus enabling syntactic macros and keyword aliasing. So a few weeks ago, I decided to switch the primary reference implementation for the Groovy Language from (J)Groovy to GroovyScript. I've still got a lot of learning and work to do though. Just as I programmed in Groovy for a year before surfacing on their mailing list, it will probably take that long or longer before I have much to contribute to Scala.
9 July 2009The grooviest Groovy of all!I've posted beta-1 of GroovyScript online. It's a lexer and parser with just-in-time pushback lexing, with an Apache licence, written in Scala. When the parser backtracks, it pushes unused tokens back into the lexer. Hence we can write a lexer/parser that enables lexical definitions to be defined as annotations using regexes in the parsed code. The following code snippet parses correctly in beta-1:
abc;
@AddComment('//[^\r\n]*') try{
defg; //hi!!!
987;
@Anno try{
@DoIt(7, 'abc',) zyx;
16.8
};
'bcdefg';
};
@Anno @Letter hijk;
lmnop;The @AddComment annotation enables //-comments to be recognized as whitespace within its tagged statement, and eventually all external files parsed from within, but not before or after the tagged statement. That's all that's working for now, but I'll eventually put in everything I blogged about in my last post, e.g. custom lexing rules, syntactic macros, name aliasing, a Unicode IME. Scala certainly proved its worth for this challenging exercise, making me think about the best path forward for GroovyScript, the 3rd language in the Groovy Programming Language family, after (J)Groovy and Groovy-DLR. As a result, I'm switching the GroovyScript target platform from the (J)Groovy AST to the Scala 2.8 parse tree.
Why? It happened like this... About a year ago, I started converting some hard-to-debug lexer/parser code written in Groovy to C#, just to code it anew somewhere, hoping to debug the logic. The Visual Studio editor complained about the static types not matching. I fiddled it so the types matched, then discovered I had also debugged the logic problem. That was when I started to reconsider the supposed benefits of dynamically typed languages. Static typing Java-style is verbose, though, but with type inference it rocks! C# has some type inference, but Scala's is incredible! I now seriously doubt the benefits of dynamic typing over its costs.
I started off building Groovier for the GrAST in Scala, but discovered Scala itself was already the grooviest language of all! First came C, then C++, then Java which should be called C3+, because after that C# came along, the sharp symbol (#) being 4 plus signs (+) joined together, which then makes Scala be C5+. Because I'm now more impressed with the Scala language engine than with (J)Groovy's, I've decided to switch GroovyScript's target platform from the GrAST to the Scala 2.8 parse tree. With inferred static typing, it's at a higher level of abstraction than the GrAST. And unlike the GrAST, I can bundle it with GroovyScript because its name is different.
But not only that, Scala's also multi-platform, running on both the JVM and the CLR. Are Codehaus (J)Groovy or Codeplex Groovy-DLR really needed? So as the underwriter for the Groovy Language, I'm also switching from (J)Groovy to GroovyScript as the primary reference implementation for the Groovy Language. The Scala language engine is now the primary platform for the Groovy Language. (J)Groovy was the first language in the Groovy Language family, but GroovyScript will soon be the leading-edge one. GroovyScript will then change its name to Groovy 2.0. Because it's now Apache-licensed, the (J)Groovy developers could adapt it to the GrAST and bundle it with (J)Groovy if they really wanted to.
16 June 2009Gr8 isn't great, it gratesA few short months after my very first posting to the Groovy Language mailing list, Graeme Rocher changed the name of Groovy on Rails to Grails. It was too late to change Groovy's groovy name as well, but I suspect the Groovy developers will do so by stealth for version 2. As the underwriter of the Groovy Language, I must ensure Groovy's development continues, and do so under its present name.
The Groovy developers recently created the "Gr8 family of technologies" brand, i.e. Groovy / Grails / Griffon / Gant / etc (see http://twitter.com/aalmiray/status/1906155191) in direct response to my blog post at http://gavingrover.blogspot.com/2008/11/groovy-language-family.html. I suspect "Gr8" is also their upcoming name for the dynamic language engine inside Groovy 2.0, to compete with Google's V8 engine inside Chrome JavaScript. Of course, the "Gr8 dynamic language engine" would soon after become an engine for all JVM-based dynamic programming languages, itself a good idea, but the SpringSource developers might then quietly ignore Groovy 2.0 support in favor of other languages running on the Gr8 engine. Like the 5 yr old JSR at http://www.jcp.org/en/jsr/detail?id=241, Groovy 2.0 would become a carcass, its only purpose to prevent anyone else using the brand.
GroovyScript will be a GPL-licensed lexer/parser for the language engine inside the Groovy 2.0 Language. If that engine changes its name, GroovyScript will then be allowed to bundle the engine as part of its distro.
See http://gavingrover.blogspot.com for more details on the Groovy Language.
from http://gavingrover.blogspot.com
19 March 2010Strach parserI've posted a wiki showing some Strach combinator parsers (written in Groovy++) working with dynamic Groovy's ASTBuilder.
The operators are only tentative. Here's hoping the Codehaus Groovy developers will allow some more intuitive operators similar to those in regex, PEG, or Scala.
8 March 2010Groovy++About half a year ago, I began to drift away from the Groovy 1.x Language, lured by the benefits of Scala. I even seriously considered moving my Groovy Language brand to a Scala-based engine, creating the Groovy syntax over that. But lately, I've been exploring the benefits of Alex Tkachman's Groovy++, with its @Typed and @Trait keywords, and even static type inference. It's brought my heart back to the true Groovy Language, with renewed vigor to fulfill the Groovy Language vision. Scala, like Haskell and Scheme and many other languages, hold wonders, but thanks to Groovy++, I've left my wayward path to return to the true Groovy Language, in awe of what other unseen miracles those AST transformations could bring!
I've turned the Groovy site at Codeplex into the official Groovy Language dashboard. I hope to expand it and keep it up to date to acknowledge everyone who's made a significant contribution to developing the Groovy Language programming ladder.
3 March 2010Groovy CeasefireI have a dilemma when creating a custom lexer/parser over Groovy's AST. The Groovy Language builders let me build the AST using the very readable builder syntax, while the Scala Language combinator parsers let me define my custom syntax using easy-to-read infix operators. The coming Scala 2.8 upgrades these to the time-efficient packrat parsers, also allowing left-recursive definitions enabling declarative definition of expression paths.
But my dilemma is I can't use both styles of syntactic shortcut in one program. What I want to code is something like this mixed Groovy/Scala style pseudocode:
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.input.CharSequenceReader
import scala.util.matching.Regex
import org.objectweb.asm.Opcodes._
import org.codehaus.groovy.ast._
import org.codehaus.groovy.ast.stmt._
import org.codehaus.groovy.ast.expr._
import org.codehaus.groovy.ast.builder.AstBuilder
import java.security._
import groovy.lang.GroovyClassLoader
import org.codehaus.groovy.control._
object ScalaCallingNodes extends Application with RegexParsers{
def whitespaceLexer = (" " | "\r\n" | "\r" | "\n")*
def nameParser = whitespaceLexer ~> "[A-Za-z_][A-Za-z_0-9]*".r
def nameParser(name:String) = whitespaceLexer ~> name
def stringParser(open:Regex, close:Regex) =
whitespaceLexer ~> open ~> ("[^" + close + "]*").r <~ close
def symbolParser(symbol:String) = whitespaceLexer ~> symbol
def printlnParser= nameParser("println") ~ stringParser("'"r, "'"r) ^^ {case name ~ paramText=>
new AstBuilder().buildFromSpec{
expression{
methodCall{
variable "this"
constant name
argumentList {
constant paramText
}
}
}
}
}
def blockParser(open:String, close:String) =
symbolParser(open) ~> printlnParser <~ symbolParser(close) ^^ {case println=>
var block= new BlockStatement([], new VariableScope())
block.addStatement(println)
block
}
def paramParser= symbolParser("(") ~> nameParser("String") ~> symbolParser("[") ~>
symbolParser("]") ~> nameParser <~ symbolParser(")")
def mainMethodParser= nameParser("public") ~> nameParser("static") ~> nameParser("void") ~>
nameParser("main") ~> paramParser ~ blockParser("{", "}") ^^ {case paramName ~ block=>
var method= new AstBuilder().buildFromSpec{
method('main', ACC_PUBLIC | ACC_STATIC, Void.TYPE) {
parameters{
parameter paramName: String[].class
}
exceptions{}
}
method.block= block
method
}
def classParser= (nameParser("public") ~> nameParser("class") ~> nameParser <~ symbolParser("{")) ~
mainMethodParser <~ symbolParser("}") ^^ {case name~method=>
var classes= new AstBuilder().buildFromSpec{
classNode name, ACC_PUBLIC, {
classNode Object
interfaces{
classNode groovy.lang.GroovyObject
}
mixins{}
genericsTypes{}
}
}
classes[0].addMethod(method)
classes[0]
}
var classData= """
|public class MyClass{
| public static void main(String[] args){
| println 'Hello, world!'
| }
|}""".stripMargin
var cn= classParser(new CharSequenceReader(classData)).get
var cu= new CompilationUnit(CompilerConfiguration.DEFAULT, null, new GroovyClassLoader())
cu.addClassNode(cn)
cu.compile()
}If I code this in Scala, I must build the AST using normal method calls, and build the arrays and Groovy lists in longhand. If I code this in Groovy, I must use the quoted prefix-style to call the Scala-based parsers, and I don't think I can even code the Scala Functions in Groovy. To handle the complexity of parsing experimental syntax to the Groovy AST, I need to use both Scala-style infix method calls and Groovy-style builder syntax so as to bring enough syntactic tersity to make such large-scale experimentation possible. But they're not both available in either language.
So what's the solution? After several years of experimentation, I believe programming languages should be statically typed, with inference, but allow dynamic typing when required. Running Groovy-style builders requires dynamic typing, while combinator parsers should be statically typed for efficiency. Perhaps one day, Scala will bring optional open classes, enabling builders, but that's probably years away, if at all. Perhaps someone could strip the dynamic-typing engine out of Groovy, and retrofit it into Scala, but the Scala developers would need to tweak the Scala syntax to allow it: this doesn't sound likely.
One the Groovy side, the best option is for Groovy to have a static mode, with mixins (traits), pattern matching, and infix symbol definitions, enabling Scala-style combinator parsers to be defined. Failing that, infix calls of symbol-named methods would allow me to call Scala's parsers tersely. Groovy should enable both infix calls of Scala's symbol-named methods and definitions of Scala Functions soon, so developers can experiment with how well we can combine combinator parsers with builders.
But even better is if Groovy gets a statically-typed mode. It just so happens Alex Tkachman is building Groovy++, a statically-typed JVM-based language with Groovy's syntax. He once wrote he thought a static version of Groovy should have Scala-style traits. Groovy 1.x project leader Guillaume Laforge wants pattern matching in dynamic Groovy, and if it comes, Groovy++ would follow suit to maintain syntactic compatability. Groovy would then only need to enable symbol-named methods and infix calls of them, so someone could clone the Scala packrat combinator parsers in Groovy++.
The symbols the Scala developers have chosen for their combinator parsers have the potential to become standardized. Many of them follow the (defacto-)standardized regex symbology, and regexes can easily be embedded between the parser symbols: the regexes and parsers working together would enable a terse context-sensitive parsing language which could become a standard. The Groovy Language should support this effort by using the same symbols for its combinator parsers.
When I'm able to tersely packrat-parse code and build a Groovy AST using builders, I can then easily experiment with designing my own custom language syntax for Groovy. A few years ago, I wrote some online notes for Java newbies learning Groovy, intending to discover the most commonly used JDK methods, so I would know which ones to create syntactic shortcuts for. And of course, I want to use CJK tokens in the custom syntax, for even more tersity.
Call for a ceasefire
But there's one major problem: the Groovy 1.x and Groovy++ project managers are at odds with each other, jockeying to protect and promote their positions in the Groovy ecosystem heirarchy! This behavior isn't good for the Groovy Language. The dynamic Groovy 1.x and the static Groovy++ need to merge into one distro, and both project managers need to give a little so the Groovy Language can move forward. I feel a little responsible in calling for this to happen because of my unique association with the Groovy Language brand. When I first came across Groovy 1.x, I assumed the role of underwriter. But lately, it's finally dawned on me that not only am I the underwriter for Groovy, but I actually own the Groovy Language brand. On that infamous day in Dec 2005 when James Strachan left the developer team, the project management baton passed to Guillaume Laforge, but the brand ownership passed to me!
So I'm calling on the Codehaus despots and Alex Tkachman to think of the future of the Groovy Language in their negotiations. It's future is more important than any petty quarrels overs financial placing in the ecosystem. I'm the Groovy Language brand-owner, and Groovy's my middle name, so I'm more interested in advancing the potential of Groovy than milking the largest possible share of the pie, and I'm calling for all parties to think of the future of the groovy Groovy Language, both static and dynamic, so developers everywhere will marvel at its power.
After Groovy++ and Groovy 1.x are shipping in one distro, Groovy 1.x can then be written in Groovy++. Groovy will soon after bring mixins and pattern matching, someone will then clone Scala's packrat parsers, Groovy will enable infix method calls, then I and others can experiment with terse parsing and AST building, creating a terser programming syntax to the Groovy AST, using all Unicode characters! Groovy's future in on the line: can you see the vision?
2 February 2010Groovy being GPL'd?In an interview with Andres Almiray, Groovy 1.x developer Alex Tkachman explains his latest creation, Groovy++, a static add-on to the dynamic Groovy Language 1.x. Simply by adding the @Typed annotation, the annotated code will compile statically. I've been wanting this addition to the Groovy Language for a long time, and in my 9 July 2009 blog entry, even talked of moving Groovy's primary implementation from the Groovy 1.7 AST to the Scala 2.8 parse tree. Alex called this crazy talk from a crazy person, but then he saw the light and began creating the statically-compiled Groovy++.
Alex writes in the interview:
There are two issues here, which prevent us from open-sourcing the compiler immediately. First of all, it uses several pieces of technology, which our company uses and plans to use in our commercial products. It was not critical when project started as experiment but now we need to extract these parts and replace/rewrite with proper open-source alternatives. The second problem is interesting by itself. We are talking with several well-known vendors about their involvement with the project. There is no much sense in finalizing exact OSS license before these discussions are not completed and we are sure that all interests are well covered. Something interesting is coming and I wish I could tell you more right now.
What are these several pieces of non-open-sourcable technology? And how did Alex code it all up so quickly? Did he use Sun's OpenJDK as the base for Groovy++ ? Is Groovy++ simply a more deeply embedded reincarnation of his joint Groovy/Java compiler? Is he threatening to use this irresistable update to Groovy 1.x to fork and take control of the primary implementation of the Groovy Language, even threatening to GPL it? Are these discussions with EMC/VMware/SpringSource and Oracle/Sun???
Groovy 2.0 update
As for my own current experimentation, I'm attempting to build a reasonably fast lexer/parser using Scala 2.8's packrat combinator parsers and the Groovy 1.7 ASTBuilder. The different code elements (e.g. statements, expressions) must be pluggable into the lexer/parser, without nls! everywhere to cater for the implied semi-colons. The format of different lexical elements (e.g. strings, GStrings, even dates) must be definable by annotations in the syntax. I'm hoping the left-recursion allowed by the new Scala 2.8 packrat parsing will even let us plug in path elements in path expressions.
Of course, Scala's parsers, with their non-alphanumeric names, don't look very elegant when called from Groovy code, but I can't use Groovy's elegant ASTBuilder calls from within Scala code. A Catch-22 ! Perhaps one day Groovy will allow Scala's non-alphanumerically-named methods to be called elegantly from within Groovy, so they look like operator calls? Or perhaps one day Scala will allow dynamic variables, like C# 4.0, so we can create builders within Scala, making writing to its AST as elegant as in Groovy and Ruby. Building HTML with builders is more elegant than Scala's embedded HTML text.
My progress in building a new lexer/parser for Groovy 1.x isn't as spectacular as Alex's progress with Groovy++ (assuming he didn't use OpenJDK), but I'm getting there. The Groovy Language must be fully configurable so developers can use any natural language they want in the syntax.
23 January 2010Groovy DilemmaIn chapter 7 of Steven Pinker's 1994 book The Language Instinct, he gives an example of a perfect right-branching sentence:
Remarkable is the rapidity of the motion of the wing of the hummingbird.This is parsed in the human brain as shown by the parentheses:
(Remarkable (is (the (rapidity (of (the (motion (of (the (wing (of (the (hummingbird))))))))))))).remarkable is the subject, the remainder is the predicate. is is the main verb, the remainder is its object (here, called the complement). the is the article, the remainder is its referent. rapidity is a phrasal head, the remainder is a prepositional phrase as tail. of is a preposition, the remainder is its tail in the phrase. And so on. Pinker gives another example easy for the brain to parse, one that includes relative and subordinate clauses:
(He gave (the candy (to the girl (that (he met (in New York) while (visiting his parents (for ten days (around Christmas and New Year's)))))))).He rearranges it so its far harder for our minds to parse:
(He gave (the girl (that (he met (in New York) while (visiting his parents (for ten days (around Christmas and New Year's)))))) the candy).The direct object the candy after the many closing parentheses forces our short-term memories to keep track of dangling phrases that need particular words to complete them. It seems our brains, unlike computers, can only remember a few dangled branches when parsing sentences.
Perhaps that's why the Lisp code that's easiest for humans to read ends with many closing parens, such as this tail-recursive sample from chapter 2 of Paul Graham's On Lisp:
(defun our-length (lst)
(if (null lst)
0
(1+ (our-length (cdr lst)))))Left-branching sentences are also easy for humans to parse. Pinker gives another example with two arrangements, one harder for humans to parse:
((The rapidity (of the motion (of the wing (of the hummingbird)))) is remarkable).and the other, a perfect left-branching sentence, easy:
(((((The hummingbird)'s wing)'s motion)'s rapidity) is remarkable).English has just a few left-branching structures, but some languages, such as Japanese, are primarily based on them.
One of the universals in Universal Grammar theory, which both Pinker and Noam Chomsky support, is that if a language has verbs before objects, as English does, then it uses prepositions, while if a language has objects before verbs, as Japanese does, it uses postpositions. Pinker mentions a possible reason this universal holds is so the language can enforce a consistent branching decision, either left-branching or right-branching, so our brains can parse it easily.
Some grammatical English sentences are impossible for our brains to parse simply because there's too many dangling branches. The first of these examples parses in our brains OK, but the other two simply don't parse:
(The rapidity (that the motion has) is remarkable).
(The rapidity (that the motion (that the wing has) has) is remarkable).
(The rapidity (that the motion (that the wing (that the hummingbird has) has) has) is remarkable).They do parse in computer languages, though. When I discovered closures in Groovy, I started using this type of unreadable embedding, but I now realize I should be making my code either left-branching or right-branching to make it more readable.
16 January 2010Programming Language StructureProgramming languages have their origin in natural language, so to understand the structure of computer languages, we need to understand natural ones. According to Systemic Functional Grammar (SFG) theory, to understand the structure of language, we need to consider its use: language is as it is because of the functions it's required to serve. Much analysis of the English language has been performed using these principles, but I haven't found much on programming languages.
Functional grammar of natural languages
According M.A.K. Halliday's SFG, the vast numbers of options for meaning potential embodied in language combine into three relatively independent components, and each of these components correspond to a certain basic function of language. Within each component, the networks of options are closely interconnected, while between components, the connections are few. He identifies the "representational" and "interactional" functions of language, and a third, the "textual" function, which is instrumental to the other two, linking with them, with itself, and with features of the situation in which it's used.
To understand these three components in natural languages, we need to understand the stages of encoding. Two principle encodings occur when speech is produced: the first converts semantic concepts into a lexical-syntactic encoding; the second converts this into spoken sounds. A secondary encoding converts some semantics directly into the vocal system, being overlaid onto the output of the lexical-syntactic encoding. Programming languages have the same three-level encoding: at the top is the semantic, in the middle is the language syntax, and at the bottom are the lexical tokens.
The representational function of language involves encoding our experience of the outside world, and of our own consciousness. It's often encoded in as neutral a way as possible for example's sake: "The Groovy Language was first officially announced by James Strachan on Friday 29 August 2003, causing some to rejoice and others to tremble."
We can analyze this as two related processes. The first has actor "James Strachan", process "to officially announce", goal "the Groovy Language", instance circumstance "first", and temporal circumstance "Friday 29 August 2008"; the second process is related as an effect in a cause-and-effect relationship, being two further equally conjoined processes: one with process "to rejoice" and actor "some"; the other with process "to tremble" and actor "others".
The interactional function of language involves injecting the language participants into the encoding. A contrived example showing many types of injects: "The Groovy Language was first announced by, of all people, creator James Strachan, sometime in August 2003. Was it on Friday 29th? Could you tell me if it was? Must have been. That august August day made some happy chappies like me rejoice, didn't it?, yeehaaaah, and probably some other unfortunates to tuh-rem-ble, ha-haaah!"
We see an informal tone, implying the relationship between speaker and listener. There's glosses added, i.e. "of all people", "august", "happy chappies like me", "unfortunates", semantic words added, i.e. "creator", semantic words removed, i.e. "officially", sounds inserted, i.e. "yeehaaaah", "ha-haaah", prepended expressions of politeness, i.e. "Could you tell me if", and words spoken differently, e.g. "tuh-rem-ble". Mood is added, i.e. a sequence of (indicative, interrogative, indicative). Probability modality is added, i.e. "must have", "probably". We could have added other modality, such as obligation, permission, or ability. We've added a tag, i.e. "didn't it?". We could have added polarity in the main predicate. What we can't indicate in this written encoding of speech is the attitudinal intonation overlaid onto each clause, of which English has hundreds. Neither can we show the body language, also part of the interactional function of speech.
Natural language in the human brain
A recent article in Scientific American says biologists now believe the specialization of the human brain’s two cerebral hemispheres was already in place when vertebrates arose 500 million years ago, and that "the left hemisphere originally seems to have focused in general on controlling well-established patterns of behavior; the right specialized in detecting and responding to unexpected stimuli. Both speech and right-handedness may have evolved from a specialization for the control of routine behavior. Face recognition and the processing of spatial relations may trace their heritage to a need to sense predators quickly."
I suspect the representational function of language is that which is produced by the left hemisphere of the brain, and the interactional function by the right hemisphere. Because the right side of the brain is responsible for unexpected stimuli, from both friend and foe, then perhaps interactional language in vertebrates began as body language and facial expressions to denote conditions relevant to others, e.g. anger, fear, affection, humidity, rain, danger, etc. Later, vocal sounds arose as the voice box developed in various species, and in humans, increasingly complex sounds became possible. The left side of the brain is responsible for dealing with regular behavior, and so allowed people to use their right hand to make sign language to communicate. Chimpanzees and gorillas use their right hands to communicate with each other, often in gestures that also incorporate the head and mouth. The article hypothesizes that the evolution of the syllable in humans triggered the ability to form sentences describing processes involving people, things, places, times, etc. Proto-representational language was probably a series of one-syllable sounds similar to what some chimps can do nowadays with sign language, e.g. "Cat eat son night". Later, these two separate functions of natural language intertwined onto human speech.
Programming language structure
When looking at programming languages, we can see the representational function easily. It maps closely to that for natural languages. The process is like a function, and the actor, goal, recipient, and other entities in the transitive structure of natural language are like the function parameters. In the object-oriented paradigm, one entity, the actor, is like the object. The circumstances are the surrounding static scope, and the relationships between processes is the sequencing of statements. Of course, the semantic domains of natural and programming languages are different: natural languages talk about a wider variety of things, themselves more vague, than programming languages. But the encoding systems are similar: the functional and object-oriented paradigms became popular for programming because between them it's easy for programmers to code about certain aspects of things they use natural language to talk about. The example in pseudocode:
Date("2003-8-29").events += {
def a = new Instances();
a[1] = jamesStrachan.officiallyAnnounce(Language.GROOVY);
a[1].effect = [some: s => s.rejoice(), others: o => o.tremble];
}The similarities between the interactional functions of natural and programming languages is more difficult to comprehend. The major complication is the extra participants in programming languages. In natural language, one person speaks, maybe one, maybe more people listen, perhaps immediately, perhaps later. Occasionally it's intended someone overhears. In programming languages, one person writes. The computer reads, but good programming practice is that other human people read the code later. Commenting, use of whitespace, and variable naming partly enable this interactional function. So does including test scripts with code. Java/C#-style exception-handling enables programmer-to-programmer interaction similar to the probability-modality of English verbal phrases, e.g. will/definitely, should/probably, might/could/possibly, won't, probably won't.
Many programming systems allow some interactional code to be separated from the representational code. One way is using system-wide aspects. A security aspect will control the pathway between various humans and different functions of the program while it's running. Aspects can control communication between the running program and different facets of the computer equipment, e.g. a logging aspect comes between the program and recording medium, a persistence aspect between the program and some storage mechanism, an execution performance aspect between the program and CPU, a concurrency aspect between the program and many CPU's, a distribution aspect between the program and another executing somewhere else. Here, we are considering these differents facets of the computer equipment to be participants in the communication, just like the programmer. Aspects can also split out code for I/O actions and the program entry point, which are program-to-human interactions. This can also be done by monads in "pure functional" languages like Haskell. Representational function in Haskell is always kept separate from interactional functions like I/O and program entry, with monads enabling the intertwining between them. Monads also control all access between the program and modifiable state in the computer, another example of an interactional function.
Textual function of language
The textual function of language in SFG is that which concerns the language medium itself. In spoken natural language, this is primarily the sequential nature of voice, and in written language, the 2-D form of the page. Whereas in natural language theory, the voice-carrying atmosphere and the ink-carrying paper are obviously mediums and not participants, it's more difficult to categorize the difference between them in programming language theory. Because a program is written as much for the CPU as for other human readers, if not more so, we could call the CPU a participant. But then why can't the CPU cache, computer memory, hard-disk storage, and comms lines also be called participants? Perhaps the participants and the transmission medium for natural languages are also more similar than different.
The textual function of language is made up of the thematic, informational, and cohesive structures. Although mainly medium-oriented, they also involve the participants. The thematic structure is speaker-oriented, the informational structure is listener-oriented. The thematic structure is overlaid onto the clause. In English, what the speaker regards as the heading to what they're saying, the theme, is put in first position. Not only clauses, but also sentences, speech acts, written paragraphs, spoken discourses, and even entire novels have themes. Some examples using lexical items James, to give, programmers, Groovy, and 2003, with theme in italics:
James Strachan gave programmers Groovy in 2003. Programmers are who James gave Groovy to in 2003. The Groovy Language is what James gave programmers in 2003. 2003 is when James gave programmers Groovy. Given was Groovy by James to programmers in 2003. In English, the Actor of the representational function's transitive structure is most likely to be separated from the interactional function's Subject and from the Theme in a clause, than those from each other. I think the textual functions of natural language are far more closely linked to the interactional function than to the representational. Perhaps the right side of the brain also processes for such texture structure.
The informational structure jumps from the top (i.e. semantic) encoding level directly to the bottom (i.e. phonological) one in English, skipping the middle (i.e. lexical/syntactic) level. This is mirrored by how programming languages such as Python use the lexical tokens to directly determine semantic meaning. In English, the speech is broken into tone units, separated by short pauses. Each tone unit has the stress on some part of it to indicate the new information. For example, each of these sentences has a different informational meaning (the bold indicates the stresses):
James gave programmers Groovy in 2003. James gave programmers the Groovy Language in 2003. James gave programmers Groovy in 2003. James gave programmers Groovy in 2003. James Strachan gave programmers Groovy in 2003. Unlike the thematic structure, the informational structures the tone unit by relating it to what has gone before, reflecting what the speaker assumes is the status of the information in the mind of the listener. The informational structure usually uses the same structure used in the thematic, but needn't. English grammar allows the lexical items to be arranged in any order to enable them to be broken up in any combination into tone units. For example, these examples restructure the clause so it can be divided into two tone units (shown by the comma), each with its own stress, so two items of new information can be introduced in one clause:
James gave Groovy to programmers, in 2003. As for Groovy, James gave it to programmers in 2003. In 2003, James gave programmers Groovy. Programming languages should follow the example of natural languages, and allow developers to structure their code to show both thematic and informational structure. The final textual function, the cohesive structure enables links between clauses, using various techniques, such as reference, pronouns, and conjunctions. Imperative programming languages rely heavily on reference, i.e. temporary variables, but don't use pronouns very much. Programming languages should also provide developers with many pronouns.
Summary
Programming languages initially represented information in the same way humans do, using transitive structures such as function calls, joined by logical relationships such as blocks and class definitions. Interactional aspects of code were initially intertwined, but could be separated out using aspects and monads. Enabling different textual structures in programs isn't very widespread, so far limited to providing different views of an AST in an IDE, only occasionally allowing "more than one way to do things" at the lexical level. When used well, textual structures in code enable someone later on to more easily read and understand the program.
In promoting the benefits of programming languages enabling different textual structures, I think it's useful to narrow down to two primary structures: the transitive and the thematic, as these two are easiest to communicate to programmers. See my earlier thoughts on how a programming language can enable more thematic variation. Programming languages of the future should provide the same functions for programmers that natural languages provide for humans.
And of course, I'm building Groovy 2.0, which will both enable thematic variation in the language syntax/morphology, and supply a vast vocabulary of Unicode tokens for names. The first iteraction will use Groovy 1.x's SwingBuilder and ASTBuilder, along with my own Scala-based combinator parsers, to turn Groovy 2.0 source into Groovy 1.x bytecode. The accompanying Strach IME will enable programmers to enter the Unicode tokens intuitively. Groovy 2.0 will break the chains of the the Antlr/Eclipse syntactic bottleneck over Groovy 1.x !!!
1 January 2010Bust Groovy open. Set it free!Ever since the core dynamicity and syntactic enhancements of Groovy 1.0 beta 1 over Java, the Groovy Language has been adding functionality upon functionality. The Groovy developers at Codehaus have taken one core technology that benefits Java developers, i.e. the meta-object protocol, and used it as a hook to hang on a closetful of their own versions of widely available JVM-based technologies. Groovy 2.0 at Codeplex will be the version of Groovy that strips away such functionality, aiming to provide developers with the features of Groovy that benefit us the most. The bundled tools will be dropped, as they duplicate functionality available in other JVM languages. The Antlr-based lexer/parser will be removed, so we can interact with the AST directly from other languages. The DGM (default Groovy methods) will be stripped out, so Groovy AST users can instead use the richer classes from languages like Scala.
Lately I've discovered the Groovy AST is inconsistent in its functionality: while earlier-coded syntactic functionality is done above the AST level, much of the more recently coded functionality that could be done above the AST is instead done under its hood. In this way, as well as by supplying an ASTBuilder that can only be used from within the Groovy Language itself, the Codehaus cartel are tying programmers in to all their own added cruft, so they can sell us the book and charge us consulting fees down the line. I'll dig under the AST and scalpel out any function that can be done in other languages. When Java 7 brings closures, I'll totally replace Groovy's implementation with Java's.
During 2010, I intend to free the Groovy Language from its Codehaus chains, to bust it open, to reveal its core essence, the kernel that most benefits, so programmers can use it simply from other superior JVM-based languages, so my own Strach IME and lexer/parser can use it, providing developers with a terse grammar that uses all Unicode tokens in its vocabulary. Beginning with Groovy 1.8 beta 1, I'll soon after release a stripped-down version consisting only of the core essence, an AST directly controlling the MOP, to provide the JVM's answer to Microsoft's DLR, an AST all JVM language implementers can build a dynamic language on top of. I'll develop a process so with each Groovy release, I can quickly release that release's MOP as a standalone. The Groovy Language Runtime will move from "open" source to open source. Set Groovy free!
23 December 2009"Groovy 2010" coming ???Groovy 1.7 is out, "in time for Christmas", and planning for v 1.8 has begun, including a new module system.
Groovy 1.x project leader Guillaume Laforge says "we would like to make a first beta of 1.8 in February or so, with a target final date for the end of the year - we love Christmas gifts". Does that mean he's going to rename it "Groovy 2010"? He once threatened to rebrand Groovy 1.6 as "GroovyX". Microsoft once tried that trick with Windows, but have since reverted to numeric versioning. Let's hope the Codehaus developers don't learn that lesson the hard way.
The new module system proposes putting Swing, XML, SQL, JMX, Beans, etc into separate modules, but the core will still be a tangled ball of many functionally different components. Everything that sits above the AST could be separated out, to encourage developers to put their own syntax on top of the AST. I'm experimenting with an alternative lexer/parser, called the "Strach" component of Groovy 2.0, aiming for greater tersity, yet retaining clarity. Experimentation is good for programming languages.
Another separable component is the Default Groovy Methods (DGM). These methods are compulsory in Groovy: if you want to use Groovy's meta-object protocol (MOP), you must also use these methods. The meta-object protocol allows programmers to add, and subsequently remove, methods on the fly, but forces these default methods on us. What if we just want to use the MOP, without the DGM? I'll be providing a component, to be called the "Wilson" component, that gives the option not to add those methods to classes, and will even let us hide default Java methods.
Programmers aren't silly: Why can't we use the feature of the Groovy Language that really benefits us, i.e. the MOP, without having other cruft shoved on us as well, such as the syntax and DGM? Groovy 2.0 will sit atop Groovy 1.7, giving more choices to programmers.
AppendixHere's the list of AST nodes used in Groovy 1.7, with indenting showing implementation inheritance, that Strach will free up for developers to use directly:
ASTNode
AnnotatedNode
ClassNode
InnerClassNode
InterfaceHelperClassNode
MixinNode
MethodNode
ConstructorNode
FieldNode
ImportNode
PackageNode
Parameter
PropertyNode
expr/Expression
expr/ConstantExpression
expr/AnnotationConstantExpression
expr/BinaryExpression
expr/DeclarationExpression
expr/TernaryExpression
expr/ElvisOperatorExpression
expr/BooleanExpression
expr/NotExpression
expr/TupleExpression
expr/ArgumentListExpression
expr/PropertyExpression
expr/AttributeExpression
expr/ListExpression
expr/ClosureListExpression
expr/MapExpression
expr/NamedArgumentListExpression
expr/ArrayExpression
expr/BitwiseNegationExpression
expr/CastExpression
expr/ClassExpression
expr/ClosureExpression
expr/ConstructorCallExpression
expr/EmptyExpression
expr/FieldExpression
expr/GStringExpression
expr/MapEntryExpression
expr/MethodCallExpression
expr/MethodPointerExpression
expr/PostfixExpression
expr/PrefixExpression
expr/RangeExpression
expr/RegexExpression
expr/SpreadExpression
expr/SpreadMapExpression
expr/StaticMethodCallExpression
expr/UnaryMinusExpression
expr/UnaryPlusExpression
expr/VariableExpression
stmt/Statement
stmt/AssertStatement
stmt/BlockStatement
stmt/BreakStatement
stmt/CaseStatement
stmt/CatchStatement
stmt/ContinueStatement
stmt/DoWhileStatement
stmt/EmptyStatement
stmt/ExpressionStatement
stmt/ForStatement
stmt/IfStatement
stmt/ReturnStatement
stmt/SwitchStatement
stmt/SynchronizedStatement
stmt/ThrowStatement
stmt/TryCatchStatement
stmt/WhileStatement
ModuleNode
GenericsType
AnnotationNode15 December 2009Try Groovy, or is it try{Groovy}catch(Exception e){} ???Run this code in Groovy 1.6 beta 2 or earlier:
try{def a= "abc"; println a}
try{def a= 123; println a+2}The result:
abc
125Now run it in Groovy 1.6 RC 2. The result:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: A try block must have at least one try or finally block.A standalone try block is great for limiting the scope of common temporary variables. Groovy enabled them in versions 1.0 and 1.5, a great improvement over Java. Scala also enables such standalone try blocks, even in the upcoming Scala 2.8.
But in this mailing list reply to me, Guillaume Laforge writes the prefered way is to use labelled blocks, i.e.
unreferencedUselessLabel: {def a= "abc"; println a}
unreferencedUselessLabel: {def a= 123; println a+2}What yukky syntax! Perhaps a better change would have been to increase syntactic elegance and tersity by eliminating parens when only one statement is in the block, just like with if and while statements:
try println "abc"
try println 123 + 5Not only that, but in someone's eagerness to restrict programmer choices, they didn't bother checking the error message: "A try block must have at least one try or finally block." (To be fair, the message has since been corrected for Groovy 1.7.)
As the Groovy Language Underwriter, my job is to be ready to continue Groovy Language development should the developers at Codehaus abandon it or change its name. However, they actually seem to be gradually removing Groovy developers' choices through stealth, dumbing down the syntax, especially now SpringSource and EMC/VMware are bankrolling Groovy Language development, or perhaps bankrolling the lack of it.
Part of my mission in creating an alternative lexer/parser for the Groovy AST is to bring back programmer choices when utilizing the Groovy AST. To quote another freedom fighter, removing programmer power from the Groovy Language syntax is something up with I will not put.
4 December 2009Groovy 2.0 status reportI grew up in Auckland New Zealand, living there for 30 years, but never considered creating a scripting language until I'd moved to Melbourne Australia 10 years ago. There, I lived in the CBD, an 8-by-8 grid of blocks known as the Hoddle Grid. I went for many walks while there, often thinking of that CBD as a huge chessboard on which to play out games. (I had a passing interest in chess when I was a kid.) The apartment I lived in was on a popular block of the grid, being where the commercial, entertainment, and recreational precincts meet. In the chessboard analogy, that block is the starting position for the white king.
There, I ran a company, GroverServer Ltd, dayjobbing as a programmer to raise funds, while working on a process to model company annual reports as Access databases. Although I'd used Access many times since its v1.0 release, when using it for a real-world complex business task, it wasn't flexible enough. I eventually concluded the VBA scripting language was nothing but a marketing con. The true genesis of Groovy happened soon after this, during a trip I made to England in Feb 2002. I decided a scripting language should run on a VM, Java's being the leading one at the time. It should have a flexible syntax, certainly not like VB's, and must enable AspectJ-style interceptions and introductions. Of course, I never met Groovy Language creator James Strachan while there. He began building Groovy 1.x soon after this, yet another of his many open source projects, recruiting the developers who now currently control it.
I stayed silent on the mailing list for a year, not wanting them to change the name while they still could easily. But then came that infamous day after DevCon2 in Dec 2005 when James left the development team. Soon after, I posted my very first posting to the Groovy Language mailing list. Six weeks later, Graeme Rocher changed the name of _Groovy on Rails_ to Grails, but it was too late for them to change Groovy's groovy name as well. By then I had learned enough about Groovy to continue its development should the developers abandon it or change its name: I had become Groovy's underwriter.
I decided to get more involved in the language, first by submitting bug reports and change requests. My very first request was for a groupBy method I'd found useful for munging data. Guillaume Laforge must have also thought it was a good method to have, because he created his own similar request a few months later with the same method name. After many similar happenings, I began to realize the Codehaus Groovy developers didn't want me around.
So I decided to branch off on my own, using the Groovy Language AST as an engine to power a different configurable programming language syntax and IME. While working on this, I discovered why statically-typed languages are better than dynamically-typed ones for large systems. So I switched to programming in Scala. At first, I thought I could use the Scala AST as a target instead of Groovy's, but I now realise dynamicity is essential in those few use cases that require it, so I'm back to targeting the Groovy AST, using the Groovy 1.7 ASTBuilder, created by Hamlet D'Arcy. For the static mode, I'll still have to target the Scala AST, though the ideal solution is if Groovy added a static mode. However, developer Alex Tkachman seems to be vetoed by project manager Guillaume Laforge on this. Years ago, the Codehaus Roadmap for Groovy 3.0 was for it to be written in Groovy, which would have required a static mode, but this idea seems to have been trashed.
I'll still be using Scala as the systems language in building Groovy 2.0. The Codeplex Groovy site will initially distribute "Groovy 1.7 with Strach", where Strach is presently just the lexer/parser, written in Scala, targetting the Groovy 1.7 AST. Eventually, it'll also target the Scala 2.8 parse tree for the static mode (unless, of course, Alex Tkachman succeeds in putting a statically-typed mode into Groovy). When distributing Groovy 1.7 with Strach, I'll experiment with replacing selected java-source classfiles from the Groovy 1.7 jar file with my own scala-source ones. Given enough time, I could even manage to totally rewrite Groovy in Scala using this process!
31 October 2009Strach IME and Groovy 2.0 making progressThe webpage for the Strach IME has been created. The Groovy page at Codeplex has been repurposed as the primary distro site for the Groovy Language 2.0 next year. Further details on each of those webpages.
27 September 2009Scala's groovy stairwayPaul Graham writes in Revenge of the Nerds how Lisp and Fortran are the trunks of two separate evolutionary trees in programming language evolution. He then lists various features of Lisp which have been making their way into languages in the Fortran language tree, including dynamic typing. Having tried out many programming languages over the past few years, I now see programming language evolution differently. Unlike Paul, I see dynamic typing as being a lack of a feature: Static typing is the true feature.
Furthermore, Lisp macros can be thought of as a low-level feature comparable to goto statements and pointers. All three can be abstracted over with higher-level abstractions. Let's look at some abstracted-away low-level features...
Gotos and breaks The original programming language was of course assembly language. Assembly had the same basic features as machine code, only a little more readable. We could branch to another part of the code based on a data value: we could use this to implement conditionals and looping. Algol enabled statements to be grouped into statically-typed blocks. With all these, we could eliminate goto statements. We could also store the program counter in a variable, branch to some different code, then later return to the place we left off: this is subroutine calling. Cobol implements this as a "GOSUB" statement. Fortran enabled subroutine parameters; Algol enabled return values; C and Lisp brought recursively-called subroutines; and Scheme brought closures. C++ and Java implemented exception throwing, giving better control flow. Scala, also having closures and exception-throwing, eliminates break and continue keywords, being incompatible with passed-around closured code, and in version 2.8, re-implements break with exceptions.
Pointers and objects Cobol brought static typing. C enabled static typing for pointed-at data. Simula and Smalltalk introduced objects. Different inheritance models were tried out: C++ used multiple inheritance, Java and C# used single implementation inheritance, Self and JavaScript used the prototype model, while Ruby and Scala used the flexible mixin model. Ruby also has open classes, at the cost of eliminating static typing. Lisp, Ruby, and Java/C# had garbage collection. By using objects everywhere, a language no longer needs pointers.
Human interface 3rd generation languages enabled more meaningful names, making code more readable, but longer. Fortran, C/C++, and Java/C# brought operator precedences, eliminating parentheses, thus shortening the code again. Scala simplified the rules for this. In Lisp and Scala, statements are also expressions, returning a value. Interactive Python's magic underscore is a simple way to pass a statement value onwards. APL, and successors J and K, brought greater tersity through a greater vocabulary of tokens. Matlab and R continue along this way for math and stats. Perl enabled thematic variation, bringing "more than one way to do it". Smalltalk was programmed in a built-in visual environment, as was spreadsheets, etc, and IDE's, all using color. Declarative paradigms, like Snobol, regexes, and Prolog make a program more readable. Indenting was used by Cobol and extended by Python. Haskell offers the choice of C-style or indent-style bracketing.
Efficiency and concurrency Pure Lisp is very inefficient, but nowadays different data structures are builtin. Numbers were always hard-coded, direct-access arrays added later, and with Scala, even objects that inherit are builtin types. Java brought threads for concurrency, while Erlang and Scala brought the safer higher-level actor model.
Macros and laziness Lisp enabled macros to control evaluation in code. Scheme enabled lazy evaluation and Haskell made it compulsory, eliminating much need for macros. Scala gives the choice of strict or lazy evaluation, in a statically-typed language. Better compilers can automatically detect and inline code that would normally require programmer-control with macros. AspectJ-style aspects and Haskell-style monads also allow code to be self-referenced and manipulated in a program.
There's different tradeoffs between these feature sets, and creating a programming language that combines them is difficult. ML, and successors Haskell, Caml, and F# achieved this when combining static typing with functional programming. OCaml and Scala successfullly combined the object-oriented programming with functional.
IDE's generally build on the lexical structure of a programming language. The Scala language compiler is designed as a stairway of increasingly-higher abstractions. Near the top is the parse tree, one step short of the lexical structure. I'm attempting to build on top of this parse-tree layer of Scala 2.8. I want to add APL/J/K-style tersity to the syntax, including enabling me to use a foreign language (simplified Chinese) everywhere in my Scala code. When done, I'll release this language as "Groovy 2.0".
9 September 2009, 9:09:09pmGroovy life and deathRick Dillon recently posted this analysis of programming language evolution. The traditional imperative languages like C/C++, and the newer ones like Java and C# are statically typed, while the traditional functional languages like Lisp/Scheme, and the newer semi-functional ones like Python, Ruby, and Javascript are dynamically typed. He gives a code sample implementing functional currying in statically-typed Java, which turns out to be quite verbose because of the explicit static types. To put static typing into functional programming, while retaining tersity, we require type inference. ML/Caml and Haskell are examples of such languages, and OCaml/F# and Scala are object-oriented language examples. When seen in this way, dynamic languages are a deadend in programming language evolution. Instead of maintaining the "systems language / scripting language" duo, future language evolution will go along the "functional language with inferred static typing" route.
I suspect many programmers coming to dynamic languages will follow the same path I did in realizing this. They will typically work in Java, C#, Cobol, PHP, and/or VB in their dayjobs. They'll discover Python or Ruby, though for me it was Groovy with the nifty closures and collections. At first, they'll just use it for scripty stuff, then start trying to build bigger and bigger systems. They'll then realise the lack of static typing means they've thrown out the baby with the bathwater. The functional languages with inferred typing will then beckon. Groovy programmers will start learning Scala because it runs on the JVM. At first they'll think that Scala will only replace Java, so they can use Groovy and Scala together, but eventually they'll see that statically-typed functional languages can replace both members of the "systems language / scripting language" duo! (Perhaps some will even say, as I did, that using the Groovy Language started off being useful, but "what began as life to me has now become death"!)
Lately, I've been trying to understand the interplay between different features in these types of languages, such as monads, macros, and mixins:
(1) Monads from Haskell enable computer languages to cleanly split program code into functional-pure and side-effecting components. Aspects, as in AspectJ and Spring, are frequently used in a system-wide manner in non-functional-paradigm languages to separate out certain non-paradigm concerns such as I/O, persistence, exception-handling, optimization, etc, from the primary representational concern. This type of separation between the representational and interactional functions of a programming language mirrors that in natural language, as analyzed in Hallidayan Systemic Functional Grammar theory.
(2) Syntactic macros can provide the most user-configurability at the surface levels, as in Lisp/Scheme and Dylan. Most programming languages provide much power in the engine, then deliberately bottleneck it for the language syntax, only to return it to the programmer at the IDE level. Natural languages don't do this, and I don't think computer languages should, but a programming language syntax is considered a holy grail for marketing the language, so not many languages have dared to allow such syntactic configurability in the past. Perhaps this syntactic component of programming languages mirrors the textual component of natural language in Systemic Functional Grammar.
(3) Scala traits (i.e. mixins) provide a more flexible yet still correct OOP system than either single or multiple inheritance. The Scala website shows how they can be used to cleanly implement the Observer pattern, the very pattern the AspectJ evangelists 10 yrs ago were saying aspects could easily implement in the non-functional language Java.
No single statically-typed functional language provides all these features, not that I yet understand them all, and how they relate to each other. I do intend to return to creating a shell over the Scala language parse tree once Scala 2.8 is out because I think this is the best opportunity to evangelize full Unicode character set programming to the world. The shell will be called GroovyScala.
21 August 2009A Groovy UndertakingThis mailing list reply from Jochen Theodorou popped up on the Reddit-programming charts recently, probably stage-managed damage-control. Jochen wrote: "James (Strachan) is great in initiating projects and gets them to a state where the examples work. But as soon as you go away from the examples and alter them just a tiny bit, it fails." Yeah, that's called Test-driven development. The solution: add more tests, then make them work! I'm a great believer in it. Jochen also wrote: "...as an active part (James dumped Groovy) over four years ago already. (...) You can say that current Groovy is Guillaume (Laforge) and me mostly, but many people did come and go, some did contribute a lot... like for example John Wilson, other did only cover a small area." The Groovy developers seem to be positioning James as only one of many Groovy Language "creators". What's up? Is someone else hoping to stand in for language creator James in an upcoming Groovy Language interview in Australian Computerworld?
Why did some developers "only cover a small area"? Perhaps they started getting harrassed after surfacing on the Groovy mailing list, as I did 3 yrs ago? At the time I thought it was just my name! I knew anyone could've been doing it, but around that time two UK teachers at my university took me to dinner and warned me that "anyone who takes on Google comes off second best". I doubt it was Google who put them up to that, and why would the Groovy FUD-spreaders do so? I suspect the real reason the present developers took control of Groovy was to try and sell it to Google as a brand name fit. But I didn't really understand how I was a threat to them. Early the following year, when Groovy 1.0 was finally released, the licence still clearly said: 4. Products derived from this Software may not be called "groovy" nor may "groovy" appear in their names without prior written permission of The Codehaus. "groovy" is a registered trademark of The Codehaus. I was just piggybacking the Groovy name because I thought it might be a good gimmick if I wanted to return to programming work one day, not for any other reason.
Then one day I was fooling around online, and looked up the U.S. Trademark database for the Groovy Language details. They weren't there! They weren't even in the history of lapsed trademarks. Codehaus was a US-based outfit, weren't they? Beta-1 of Groovy 1.1 was then released with the Apache licence. Groovy's previous licence had only been a bluff! But was that really a reason to harrass me? Other programming languages don't trademark their names: I suspect if I changed my name to Scalia Scalow and surfaced on the Scala mailing list, no-one there would feel insecure enough to harass me because of my name. It seems there's an essential difference between languages: Scala is a quality language designed within academia, though intended for business, to bring present Java and C# developers a little closer to functional programming. Groovy is an adhoc commercial creation, designed to flip the investing companies at a profit, first Bay Partners, then SpringSource, and now VMware. I suspect it's that difference that makes the Groovy Language developers ultra-picky about who's involved in the development.
Jochen also wrote: "Developing a language is a lot of stress. You have to discuss things on an emotional level very often. (...) And many people get tired of these discussions, so did James and so did for example John." Is he priming up the community for the next departure, perhaps himself or Guillaume? Because I imagine VMware had more cash in their "cash and stock" offer for SpringSource than did SpringSource in theirs for G2One, perhaps Guillaume's suddenly lost some motivation to continue with (J)Groovy development. And what about me? After 4 years of working on an idea to make programming languages terser using all Unicode tokens, staying in mainland China because it's the home of the simplified Chinese characters and targeting the Groovy AST because of its adhoc construction and its groovy name, I confess I'm also getting a little tired of it all. I never really knew when I first got involved what a truly dirty business open source software development is.
8 August 2009Groovy futuresThe (J)Groovy developers recently released beta-1 of "Groovy 1.7". But will it really be called version 1.7 ? The developers changed the name of Groovy 1.1 to 1.5 at the last moment, and they might do it again with version 1.7. Besides plucking some stuff out of Spock and ASM, they've begun on inner classes, the main feature from Java still missing from (J)Groovy. And they've dusted off the GroovyScriptEngine, rewriting it, probably as a snipe at my own GroovyScript-branded version of Groovy for the Scala parse tree. I'm not sure how many developer hours SpringSource threw at beta-1, but I suspect not many. They need to keep up the appearance of developing the Groovy Language, while continuing to collect consulting fees, to get a high valuation in their talks with JBoss or whoever it is.
While "Groovy 2.0" has been talked about as the version of (J)Groovy shipping with a new improved MOP, this current 1.7 line might end up with that name, not because it has a new MOP but for marketing reasons only. Perhaps they'll bring out new editions of their books. The Groovy Language release schedule now seems to completely revolve around marketing and training events. But where would the new MOP fit in? If the developers finally manage to do what John Wilson couldn't, what Groovy 1.x botched, in producing a Java-language compatible MOP for the JVM, would SpringSource really want to prewrap it in a programming language and tag it with the "Groovy" brand? I'd think they'd want to pitch it as the JVM's answer to Microsoft's DLR, something like "the Spring DLR for the JVM", and promote it for all JVM-based dynamic languages.
But even without a new MOP, doesn't the Groovy Language 1.x still have a future? It's certainly the language of choice for Grails. For other use cases, such as scripting and testing, it may now be superceded. Before Groovy, developers used JPython. Some pitched Groovy as a better choice because it's Java-syntax compatible, but I don't think they really understood the mindset of a typical corporate programmer. Programmers want to expand their skillsets, so would rather choose JRuby for scripting and testing because it's NOT Java-syntax compatible. JRuby is curriculum vitae compatible for Java developers, being another step up to a Rails job. And what of (J)Groovy's recent push with Griffon? With Grails there was little serious competition for Groovy, but with Griffon, Groovy is up against the might of JavaFX. The recent trend of calling (J)Groovy "Groovy on Grails" may have hit the mark.
As the Groovy Language underwriter, I often think about Groovy's future, both the technology and the brand, and I'm now a little pessimistic. A week after I said I was switching from C# to Scala/JVM for programming to the Groovy Language AST, Groovy Language creator James Strachan (by another total coincidence :-) bought and read the Programming Scala book, and subsequently said he thought Scala was a better choice than Groovy for systems programming. After programming in Scala for a mere month, and that only part-time, I realized Scala is already the language I was trying to modify the (J)Groovy AST to be. There are still a few things Scala lacks, such as syntactic macros, but I've no doubt they'll be coming in an upcoming version of Scala. What programming I did do in Scala (i.e. build a combinator parsing library), I later discovered a better version already existed in the Scala libraries. Recently I started to doubt why Codehaus (J)Groovy/JVM and Codeplex Groovy/DLR exist when I could target both platforms via the Scala parse tree, but now I'm wondering why I'm programming at all?
With (J)Groovy being too minimal a wrapper for an upcoming "SpringSource DLR for the JVM", with corporate developers prefering JRuby to Groovy for scripty stuff, with Swing already being targeted effectively by JavaFX, and with Scala becoming recognized as the best choice for new systems programming for the JVM, the only role for Groovy in the foreseeable future seems to be as Groovy 1.x for Grails. People will probably just call it "the Grails language". Perhaps the only future for the Groovy brand is as my middle name.
28 July 2009One Groovy Language to rule them allSome say that because programmers read code far more than they write it, it's better for a language to have a clean minimal syntax, so we can easily read code others have written. But natural languages don't work that way.
Whatever our native language, we can read many more words than we use when we write, and we can understand in listening many more words than we usually speak. As we learn our native tongue as children, we hear many varieties of it and much vocabulary, learn to understand it quickly, but we seldom reproduce most of it. Even as adults, it doesn't take long when listening to a new flavor and accent of English to understand it, but we take much longer to reliably imitate it, if at all.
I once spent a couple years studying natural language, then returned to programming as a hobby. I'd never liked Perl: the "there's more than one way to do it" philosophy had never appealed; I'd preferred the more minimal syntax of Python or Smalltalk. But when I returned to programming, what Larry Wall's been saying all these years began to make sense. Programming language designers who restrict what the language can do, providing only one way to do things, are like the grammar school English teachers who try to prescribe to their students what correct English is.
Mainframe programmers read lots of Cobol programs to understand their meaning, but don't write much of it when maintaining programs. An experienced Cobol programmer can flick through a printout and quickly understand the program. Computer Science students read the C code in the Unix kernel, but seldom change it. They read it so they can read C code easily. We should be able to understand code written by others, not by contraining what others can write, but by more experience in reading what others have written.
But one programmer can only really read code easily in one or two computer languages, just as most people can only learn one or two natural languages really well. For this reason, programmers are categorized by the language they program in.
The Groovy Language will solve this problem by being available for every available AST. As well as the (J)Groovy flavor for the Groovy/JVM AST, the GroovyScript flavor for the Scala parse tree, and the Groovy-DLR flavor for Microsoft's DLR, the Groovy Language will eventually be available for every AST platform. The original (J)Groovy syntax was a close copy of Java's, while being semantically different; Java's was a close copy of C++'s, also semantically different. So someone who knew Java had a head start learning Groovy, and so on.
The Groovy Language will be the end-of-the-line for the C-syntax, available for every practical AST, and so replacing other programmng language syntaxes. Therefore, if someone learns (J)Groovy, they can then switch to using Groovy-DLR easily, just as when someone learns British English, they can switch to using Indian English easily. There will be one Groovy Language to rule other computer language syntaxes. I guess they'll eventually become obsolete.
27 July 2009What makes Scala groovy?Lately, I've been thinking about what makes Scala groovier than (J)Groovy...
(1) Nested classes and packages. I can nest my class definitions any way I want when doodling, i.e. doing experimental programming. Groovy DevCon 5 talked about nested classes for Groovy 1.7. Anonymous inner classes aren't necessary, though, as closures can simulate them.
(2) Pattern matching. Pattern matching is an incremental addition to a programming language that, once learnt, is hard to do without. Groovy 2.0 is slated to bring pattern matching, but some think the coming Groovy 2.0 is a myth, just like JSR 241 and the Groovy language spec.
(3) Combinator parsing. People are bored with the limits of regexes, and want more declarative power in parsing stuff. Scala now has a terse combinator parsing syntax, and Scala 2.8 will introduce the more efficient packrat parsing trait for them.
(4) Consistency of syntax and semantics, e.g. the method/field uniform access principle, as opposed to the tack-on approach of (J)Groovy, which is necessary to ensure seemless backwards-compatibility with Java classes. Returning to Groovy coding after working with Scala's "clean break with Java" design, though, is harder than returning to Java coding after working with Groovy. Scala's operator/method and parameter/indexing dualities are features that could successfully be put into Groovy, though.
(5) Mixins/traits. The (J)Groovy 1.1 (betas) AST had empty stubs for mixins, but the Groovy developers never implemented them. After seeing how Scala traits could do things I thought were only elegant with aspects, e.g. the Observer pattern, I now believe Groovy needs those mixins.
(6) Inferred static typing. After working with this in an IDE, one wonders how the "more tests are better than more typing" lie spread so rapidly. Inferred static typing is "more typing (static) with less typing (fingers on keyboard)".
What (J)Groovy features are groovier than Scala's?(1) Builders. I've heard Ruby copied this feature from Groovy. Has it been done in Scala? Scala's syntax already allows it, though I've yet to see a Scala implementation of Groovy's HtmlBuilder. Scala's inline XML syntax is ugly compared to builder-based syntax.
(2) GStrings. Also known as interpolated strings, they enable us to do much commonly used string handling, e.g. printing, with a terser syntax. Perhaps Scala's scalable syntax could enable these without syntax changes, I don't know.
(3) Dynamic typing. Dynamic typing is useful in the 20% of code where static typing isn't suitable. Just as dynamic Python enhances static C code, dynamic Groovy enhances static Java code. Static languages can emulate some dynamic typing features by typing everything with the Object type, or using an expando object. Open classes that enable inheritance may be impossible to emulate, though. (Groovy also enables built-in static typing which is slower than its dynamic typing. Use this feature for interface documentation only, use Java instead for other static typing requirements.)
It seems Scala could copy (J)Groovy's groovy features far easier than Groovy could copy Scala's. Although Groovy pitches itself as "complementing, not competing with" Scala, since programming in Scala, I've yet to find much that Groovy's a more obvious fit for. Scala's tersity and inferred typing are addictive.
What could make both Scala and (J)Groovy groovier?Self-mutating syntax. This would enable syntactic macros and keyword aliasing, thus putting Scala into the realm of Lisp/Scheme. Some of Scala's syntax looks like it could be redefined as a syntactic macro, e.g. the for comprehension could generate the underlying calls to map, filter, etc. After pulling out these types of simplifications, perhaps Scala's remaining syntax would be easily handled by a library based on Scala's own parser combinators, making Scala syntax self-referential.
The GroovyScript source code I've posted enables annotations to define lexical and syntactic features of a C-syntax language such as (J)Groovy or Scala, perhaps another way of making it self-referential. It requires using monadic bind and return/value parser combinators, making it a "context-sensitive" grammar. Packrat parsing can do context-free parsing in linear time, though with the cost of memory space: can multicores keep such context-sensitive parsing tractable as well?
22 July 2009Scala eclipses (J)GroovyI've posted beta-2 of GroovyScript. GStrings are now parsing. The parser, written in Scala, uses a pushback lexer so lexical tokens can be defined in the parsed syntax using annotations. What's there may be useful for someone to see how a parser with a pushback lexer can work. However, I'm now looking at whether I can rewrite it as an extension to Scala's built-in combinator parser library, so don't expect anything more for a while.
The more I program in Scala, the more convinced I become that it's the grooviest Groovy Language of all. Scala's lexing and syntax needs to be more customizable, though, which is what GroovyScript's all about, adding an alternative lexer/parser to the Scala parse tree, to make the syntax self-referential, thus enabling syntactic macros and keyword aliasing. So a few weeks ago, I decided to switch the primary reference implementation for the Groovy Language from (J)Groovy to GroovyScript. I've still got a lot of learning and work to do though. Just as I programmed in Groovy for a year before surfacing on their mailing list, it will probably take that long or longer before I have much to contribute to Scala.
9 July 2009The grooviest Groovy of all!I've posted beta-1 of GroovyScript online. It's a lexer and parser with just-in-time pushback lexing, with an Apache licence, written in Scala. When the parser backtracks, it pushes unused tokens back into the lexer. Hence we can write a lexer/parser that enables lexical definitions to be defined as annotations using regexes in the parsed code. The following code snippet parses correctly in beta-1:
abc;
@AddComment('//[^\r\n]*') try{
defg; //hi!!!
987;
@Anno try{
@DoIt(7, 'abc',) zyx;
16.8
};
'bcdefg';
};
@Anno @Letter hijk;
lmnop;The @AddComment annotation enables //-comments to be recognized as whitespace within its tagged statement, and eventually all external files parsed from within, but not before or after the tagged statement. That's all that's working for now, but I'll eventually put in everything I blogged about in my last post, e.g. custom lexing rules, syntactic macros, name aliasing, a Unicode IME. Scala certainly proved its worth for this challenging exercise, making me think about the best path forward for GroovyScript, the 3rd language in the Groovy Programming Language family, after (J)Groovy and Groovy-DLR. As a result, I'm switching the GroovyScript target platform from the (J)Groovy AST to the Scala 2.8 parse tree.
Why? It happened like this... About a year ago, I started converting some hard-to-debug lexer/parser code written in Groovy to C#, just to code it anew somewhere, hoping to debug the logic. The Visual Studio editor complained about the static types not matching. I fiddled it so the types matched, then discovered I had also debugged the logic problem. That was when I started to reconsider the supposed benefits of dynamically typed languages. Static typing Java-style is verbose, though, but with type inference it rocks! C# has some type inference, but Scala's is incredible! I now seriously doubt the benefits of dynamic typing over its costs.
I started off building Groovier for the GrAST in Scala, but discovered Scala itself was already the grooviest language of all! First came C, then C++, then Java which should be called C3+, because after that C# came along, the sharp symbol (#) being 4 plus signs (+) joined together, which then makes Scala be C5+. Because I'm now more impressed with the Scala language engine than with (J)Groovy's, I've decided to switch GroovyScript's target platform from the GrAST to the Scala 2.8 parse tree. With inferred static typing, it's at a higher level of abstraction than the GrAST. And unlike the GrAST, I can bundle it with GroovyScript because its name is different.
But not only that, Scala's also multi-platform, running on both the JVM and the CLR. Are Codehaus (J)Groovy or Codeplex Groovy-DLR really needed? So as the underwriter for the Groovy Language, I'm also switching from (J)Groovy to GroovyScript as the primary reference implementation for the Groovy Language. The Scala language engine is now the primary platform for the Groovy Language. (J)Groovy was the first language in the Groovy Language family, but GroovyScript will soon be the leading-edge one. GroovyScript will then change its name to Groovy 2.0. Because it's now Apache-licensed, the (J)Groovy developers could adapt it to the GrAST and bundle it with (J)Groovy if they really wanted to.
16 June 2009Gr8 isn't great, it gratesA few short months after my very first posting to the Groovy Language mailing list, Graeme Rocher changed the name of Groovy on Rails to Grails. It was too late to change Groovy's groovy name as well, but I suspect the Groovy developers will do so by stealth for version 2. As the underwriter of the Groovy Language, I must ensure Groovy's development continues, and do so under its present name.
The Groovy developers recently created the "Gr8 family of technologies" brand, i.e. Groovy / Grails / Griffon / Gant / etc (see http://twitter.com/aalmiray/status/1906155191) in direct response to my blog post at http://gavingrover.blogspot.com/2008/11/groovy-language-family.html. I suspect "Gr8" is also their upcoming name for the dynamic language engine inside Groovy 2.0, to compete with Google's V8 engine inside Chrome JavaScript. Of course, the "Gr8 dynamic language engine" would soon after become an engine for all JVM-based dynamic programming languages, itself a good idea, but the SpringSource developers might then quietly ignore Groovy 2.0 support in favor of other languages running on the Gr8 engine. Like the 5 yr old JSR at http://www.jcp.org/en/jsr/detail?id=241, Groovy 2.0 would become a carcass, its only purpose to prevent anyone else using the brand.
GroovyScript will be a GPL-licensed lexer/parser for the language engine inside the Groovy 2.0 Language. If that engine changes its name, GroovyScript will then be allowed to bundle the engine as part of its distro.
See http://gavingrover.blogspot.com for more details on the Groovy Language.
from http://gavingrover.blogspot.com