Because my blogspot blog has been continuously blocked in the country
I live in since May 2009, I've been posting entries here...
19 March 2010Strach parserI've posted a wiki showing some Strach
combinator parsers (written in Groovy++) working with dynamic Groovy's
ASTBuilder.
The operators are only tentative. Here's hoping the Codehaus Groovy
developers will allow some more intuitive operators similar to those in
regex, PEG, or Scala.
8 March 2010Groovy++About half a year ago, I began to drift away from
the Groovy 1.x Language, lured by the benefits of Scala. I even
seriously considered moving my Groovy Language brand to a Scala-based
engine, creating the Groovy syntax over that. But lately, I've been
exploring the benefits of Alex Tkachman's Groovy++, with its @Typed and
@Trait keywords, and even static type inference. It's brought my heart
back to the true Groovy Language, with renewed vigor to fulfill the
Groovy Language vision. Scala, like Haskell and Scheme and many other
languages, hold wonders, but thanks to Groovy++, I've left my wayward
path to return to the true Groovy Language, in awe of what other unseen
miracles those AST transformations could bring!
I've turned the Groovy site at Codeplex into the official Groovy
Language dashboard. I hope to expand it and keep it up to date to
acknowledge everyone who's made a significant contribution to developing
the Groovy Language programming ladder.
3 March 2010Groovy CeasefireI have a dilemma when creating a custom
lexer/parser over Groovy's AST. The Groovy Language builders let me
build the AST using the very readable builder syntax, while the Scala
Language combinator parsers let me define my custom syntax using
easy-to-read infix operators. The coming Scala 2.8 upgrades these to the
time-efficient packrat parsers, also allowing left-recursive
definitions enabling declarative definition of expression paths.
But my dilemma is I can't use both styles of syntactic shortcut in
one program. What I want to code is something like this mixed
Groovy/Scala style pseudocode:
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.input.CharSequenceReader
import scala.util.matching.Regex
import org.objectweb.asm.Opcodes._
import org.codehaus.groovy.ast._
import org.codehaus.groovy.ast.stmt._
import org.codehaus.groovy.ast.expr._
import org.codehaus.groovy.ast.builder.AstBuilder
import java.security._
import groovy.lang.GroovyClassLoader
import org.codehaus.groovy.control._
object ScalaCallingNodes extends Application with RegexParsers{
def whitespaceLexer = (" " | "\r\n" | "\r" | "\n")*
def nameParser = whitespaceLexer ~> "[A-Za-z_][A-Za-z_0-9]*".r
def nameParser(name:String) = whitespaceLexer ~> name
def stringParser(open:Regex, close:Regex) =
whitespaceLexer ~> open ~> ("[^" + close + "]*").r <~ close
def symbolParser(symbol:String) = whitespaceLexer ~> symbol
def printlnParser= nameParser("println") ~ stringParser("'"r, "'"r) ^^ {case name ~ paramText=>
new AstBuilder().buildFromSpec{
expression{
methodCall{
variable "this"
constant name
argumentList {
constant paramText
}
}
}
}
}
def blockParser(open:String, close:String) =
symbolParser(open) ~> printlnParser <~ symbolParser(close) ^^ {case println=>
var block= new BlockStatement([], new VariableScope())
block.addStatement(println)
block
}
def paramParser= symbolParser("(") ~> nameParser("String") ~> symbolParser("[") ~>
symbolParser("]") ~> nameParser <~ symbolParser(")")
def mainMethodParser= nameParser("public") ~> nameParser("static") ~> nameParser("void") ~>
nameParser("main") ~> paramParser ~ blockParser("{", "}") ^^ {case paramName ~ block=>
var method= new AstBuilder().buildFromSpec{
method('main', ACC_PUBLIC | ACC_STATIC, Void.TYPE) {
parameters{
parameter paramName: String[].class
}
exceptions{}
}
method.block= block
method
}
def classParser= (nameParser("public") ~> nameParser("class") ~> nameParser <~ symbolParser("{")) ~
mainMethodParser <~ symbolParser("}") ^^ {case name~method=>
var classes= new AstBuilder().buildFromSpec{
classNode name, ACC_PUBLIC, {
classNode Object
interfaces{
classNode groovy.lang.GroovyObject
}
mixins{}
genericsTypes{}
}
}
classes[0].addMethod(method)
classes[0]
}
var classData= """
|public class MyClass{
| public static void main(String[] args){
| println 'Hello, world!'
| }
|}""".stripMargin
var cn= classParser(new CharSequenceReader(classData)).get
var cu= new CompilationUnit(CompilerConfiguration.DEFAULT, null, new GroovyClassLoader())
cu.addClassNode(cn)
cu.compile()
}If I code this in Scala, I must build the AST using normal method
calls, and build the arrays and Groovy lists in longhand. If I code this
in Groovy, I must use the quoted prefix-style to call the Scala-based
parsers, and I don't think I can even code the Scala Functions in
Groovy. To handle the complexity of parsing experimental syntax to the
Groovy AST, I need to use both Scala-style infix method calls and
Groovy-style builder syntax so as to bring enough syntactic tersity to
make such large-scale experimentation possible. But they're not both
available in either language.
So what's the solution? After several years of experimentation, I
believe programming languages should be statically typed, with
inference, but allow dynamic typing when required. Running Groovy-style
builders requires dynamic typing, while combinator parsers should be
statically typed for efficiency. Perhaps one day, Scala will bring
optional open classes, enabling builders, but that's probably years
away, if at all. Perhaps someone could strip the dynamic-typing engine
out of Groovy, and retrofit it into Scala, but the Scala developers
would need to tweak the Scala syntax to allow it: this doesn't sound
likely.
One the Groovy side, the best option is for Groovy to have a static
mode, with mixins (traits), pattern matching, and infix symbol
definitions, enabling Scala-style combinator parsers to be defined.
Failing that, infix calls of symbol-named methods would allow me to call
Scala's parsers tersely. Groovy should enable both infix calls of
Scala's symbol-named methods and definitions of Scala Functions soon, so
developers can experiment with how well we can combine combinator
parsers with builders.
But even better is if Groovy gets a statically-typed mode. It just so
happens Alex Tkachman is building Groovy++, a statically-typed
JVM-based language with Groovy's syntax. He once wrote he thought a
static version of Groovy should have Scala-style traits. Groovy 1.x
project leader Guillaume Laforge wants pattern matching in dynamic
Groovy, and if it comes, Groovy++ would follow suit to maintain
syntactic compatability. Groovy would then only need to enable
symbol-named methods and infix calls of them, so someone could clone the
Scala packrat combinator parsers in Groovy++.
The symbols the Scala developers have chosen for their combinator
parsers have the potential to become standardized. Many of them follow
the (defacto-)standardized regex symbology, and regexes can easily be
embedded between the parser symbols: the regexes and parsers working
together would enable a terse context-sensitive parsing language which
could become a standard. The Groovy Language should support this effort
by using the same symbols for its combinator parsers.
When I'm able to tersely packrat-parse code and build a Groovy AST
using builders, I can then easily experiment with designing my own
custom language syntax for Groovy. A few years ago, I wrote some online
notes for Java newbies learning Groovy, intending to discover the most
commonly used JDK methods, so I would know which ones to create
syntactic shortcuts for. And of course, I want to use CJK tokens in the
custom syntax, for even more tersity.
Call for a ceasefire
But there's one major problem: the Groovy 1.x and Groovy++ project
managers are at odds with each other, jockeying to protect and promote
their positions in the Groovy ecosystem heirarchy! This behavior isn't
good for the Groovy Language. The dynamic Groovy 1.x and the static
Groovy++ need to merge into one distro, and both project managers need
to give a little so the Groovy Language can move forward. I feel a
little responsible in calling for this to happen because of my unique
association with the Groovy Language brand. When I first came across
Groovy 1.x, I assumed the role of underwriter. But lately, it's finally
dawned on me that not only am I the underwriter for Groovy, but I
actually own the Groovy Language brand. On that infamous day in Dec 2005
when James Strachan left the developer team, the project management
baton passed to Guillaume Laforge, but the brand ownership passed to me!
So I'm calling on the Codehaus despots and Alex Tkachman to think of
the future of the Groovy Language in their negotiations. It's future is
more important than any petty quarrels overs financial placing in the
ecosystem. I'm the Groovy Language brand-owner, and Groovy's my middle
name, so I'm more interested in advancing the potential of Groovy than
milking the largest possible share of the pie, and I'm calling for all
parties to think of the future of the groovy Groovy Language, both
static and dynamic, so developers everywhere will marvel at its power.
After Groovy++ and Groovy 1.x are shipping in one distro, Groovy 1.x
can then be written in Groovy++. Groovy will soon after bring mixins and
pattern matching, someone will then clone Scala's packrat parsers,
Groovy will enable infix method calls, then I and others can experiment
with terse parsing and AST building, creating a terser programming
syntax to the Groovy AST, using all Unicode characters! Groovy's future
in on the line: can you see the vision?
2 February 2010Groovy being GPL'd?In an interview with Andres
Almiray, Groovy 1.x developer Alex Tkachman explains his latest
creation, Groovy++, a static add-on to the dynamic Groovy Language 1.x.
Simply by adding the @Typed annotation, the annotated code will compile
statically. I've been wanting this addition to the Groovy Language for a
long time, and in my 9 July 2009 blog entry, even talked of moving
Groovy's primary implementation from the Groovy 1.7 AST to the Scala 2.8
parse tree. Alex called this crazy talk from a crazy person, but then
he saw the light and began creating the statically-compiled Groovy++.
Alex writes in the interview:
There are two issues here, which prevent us from open-sourcing the
compiler immediately. First of all, it uses several pieces of
technology, which our company uses and plans to use in our commercial
products. It was not critical when project started as experiment but now
we need to extract these parts and replace/rewrite with proper
open-source alternatives. The second problem is interesting by itself.
We are talking with several well-known vendors about their involvement
with the project. There is no much sense in finalizing exact OSS license
before these discussions are not completed and we are sure that all
interests are well covered. Something interesting is coming and I wish I
could tell you more right now.
What are these several pieces of non-open-sourcable technology? And
how did Alex code it all up so quickly? Did he use Sun's OpenJDK as the
base for Groovy++ ? Is Groovy++ simply a more deeply embedded
reincarnation of his joint Groovy/Java compiler? Is he threatening to
use this irresistable update to Groovy 1.x to fork and take control of
the primary implementation of the Groovy Language, even threatening to
GPL it? Are these discussions with EMC/VMware/SpringSource and
Oracle/Sun???
Groovy 2.0 update
As for my own current experimentation, I'm attempting to build a
reasonably fast lexer/parser using Scala 2.8's packrat combinator
parsers and the Groovy 1.7 ASTBuilder. The different code elements (e.g.
statements, expressions) must be pluggable into the lexer/parser,
without nls! everywhere to cater for the implied semi-colons. The format
of different lexical elements (e.g. strings, GStrings, even dates) must
be definable by annotations in the syntax. I'm hoping the
left-recursion allowed by the new Scala 2.8 packrat parsing will even
let us plug in path elements in path expressions.
Of course, Scala's parsers, with their non-alphanumeric names, don't
look very elegant when called from Groovy code, but I can't use Groovy's
elegant ASTBuilder calls from within Scala code. A Catch-22 ! Perhaps
one day Groovy will allow Scala's non-alphanumerically-named methods to
be called elegantly from within Groovy, so they look like operator
calls? Or perhaps one day Scala will allow dynamic variables, like C#
4.0, so we can create builders within Scala, making writing to its AST
as elegant as in Groovy and Ruby. Building HTML with builders is more
elegant than Scala's embedded HTML text.
My progress in building a new lexer/parser for Groovy 1.x isn't as
spectacular as Alex's progress with Groovy++ (assuming he didn't use
OpenJDK), but I'm getting there. The Groovy Language must be fully
configurable so developers can use any natural language they want in the
syntax.
23 January 2010Groovy DilemmaIn chapter 7 of Steven Pinker's 1994
book The Language Instinct, he gives an example of a perfect
right-branching sentence:
Remarkable is the rapidity of the motion of the wing of the
hummingbird.This is parsed in the human brain as shown by the
parentheses:
(Remarkable (is (the (rapidity (of (the (motion (of (the (wing (of
(the (hummingbird))))))))))))).remarkable is the subject, the remainder
is the predicate. is is the main verb, the remainder is its object
(here, called the complement). the is the article, the remainder is its
referent. rapidity is a phrasal head, the remainder is a prepositional
phrase as tail. of is a preposition, the remainder is its tail in the
phrase. And so on. Pinker gives another example easy for the brain to
parse, one that includes relative and subordinate clauses:
(He gave (the candy (to the girl (that (he met (in New York) while
(visiting his parents (for ten days (around Christmas and New
Year's)))))))).He rearranges it so its far harder for our minds to
parse:
(He gave (the girl (that (he met (in New York) while (visiting his
parents (for ten days (around Christmas and New Year's)))))) the
candy).The direct object the candy after the many closing parentheses
forces our short-term memories to keep track of dangling phrases that
need particular words to complete them. It seems our brains, unlike
computers, can only remember a few dangled branches when parsing
sentences.
Perhaps that's why the Lisp code that's easiest for humans to read
ends with many closing parens, such as this tail-recursive sample from
chapter 2 of Paul Graham's On Lisp:
(defun our-length (lst)
(if (null lst)
0
(1+ (our-length (cdr lst)))))Left-branching sentences are also
easy for humans to parse. Pinker gives another example with two
arrangements, one harder for humans to parse:
((The rapidity (of the motion (of the wing (of the hummingbird)))) is
remarkable).and the other, a perfect left-branching sentence, easy:
(((((The hummingbird)'s wing)'s motion)'s rapidity) is
remarkable).English has just a few left-branching structures, but some
languages, such as Japanese, are primarily based on them.
One of the universals in Universal Grammar theory, which both Pinker
and Noam Chomsky support, is that if a language has verbs before
objects, as English does, then it uses prepositions, while if a language
has objects before verbs, as Japanese does, it uses postpositions.
Pinker mentions a possible reason this universal holds is so the
language can enforce a consistent branching decision, either
left-branching or right-branching, so our brains can parse it easily.
Some grammatical English sentences are impossible for our brains to
parse simply because there's too many dangling branches. The first of
these examples parses in our brains OK, but the other two simply don't
parse:
(The rapidity (that the motion has) is remarkable).
(The rapidity (that the motion (that the wing has) has) is remarkable).
(The rapidity (that the motion (that the wing (that the hummingbird
has) has) has) is remarkable).They do parse in computer languages,
though. When I discovered closures in Groovy, I started using this type
of unreadable embedding, but I now realize I should be making my code
either left-branching or right-branching to make it more readable.
16 January 2010Programming Language StructureProgramming languages
have their origin in natural language, so to understand the structure of
computer languages, we need to understand natural ones. According to
Systemic Functional Grammar (SFG) theory, to understand the structure of
language, we need to consider its use: language is as it is because of
the functions it's required to serve. Much analysis of the English
language has been performed using these principles, but I haven't found
much on programming languages.
Functional grammar of natural languages
According M.A.K. Halliday's SFG, the vast numbers of options for
meaning potential embodied in language combine into three relatively
independent components, and each of these components correspond to a
certain basic function of language. Within each component, the networks
of options are closely interconnected, while between components, the
connections are few. He identifies the "representational" and
"interactional" functions of language, and a third, the "textual"
function, which is instrumental to the other two, linking with them,
with itself, and with features of the situation in which it's used.
To understand these three components in natural languages, we need to
understand the stages of encoding. Two principle encodings occur when
speech is produced: the first converts semantic concepts into a
lexical-syntactic encoding; the second converts this into spoken sounds.
A secondary encoding converts some semantics directly into the vocal
system, being overlaid onto the output of the lexical-syntactic
encoding. Programming languages have the same three-level encoding: at
the top is the semantic, in the middle is the language syntax, and at
the bottom are the lexical tokens.
The representational function of language involves encoding our
experience of the outside world, and of our own consciousness. It's
often encoded in as neutral a way as possible for example's sake: "The
Groovy Language was first officially announced by James Strachan on
Friday 29 August 2003, causing some to rejoice and others to tremble."
We can analyze this as two related processes. The first has actor
"James Strachan", process "to officially announce", goal "the Groovy
Language", instance circumstance "first", and temporal circumstance
"Friday 29 August 2008"; the second process is related as an effect in a
cause-and-effect relationship, being two further equally conjoined
processes: one with process "to rejoice" and actor "some"; the other
with process "to tremble" and actor "others".
The interactional function of language involves injecting the
language participants into the encoding. A contrived example showing
many types of injects: "The Groovy Language was first announced by, of
all people, creator James Strachan, sometime in August 2003. Was it on
Friday 29th? Could you tell me if it was? Must have been. That august
August day made some happy chappies like me rejoice, didn't it?,
yeehaaaah, and probably some other unfortunates to tuh-rem-ble,
ha-haaah!"
We see an informal tone, implying the relationship between speaker
and listener. There's glosses added, i.e. "of all people", "august",
"happy chappies like me", "unfortunates", semantic words added, i.e.
"creator", semantic words removed, i.e. "officially", sounds inserted,
i.e. "yeehaaaah", "ha-haaah", prepended expressions of politeness, i.e.
"Could you tell me if", and words spoken differently, e.g.
"tuh-rem-ble". Mood is added, i.e. a sequence of (indicative,
interrogative, indicative). Probability modality is added, i.e. "must
have", "probably". We could have added other modality, such as
obligation, permission, or ability. We've added a tag, i.e. "didn't
it?". We could have added polarity in the main predicate. What we can't
indicate in this written encoding of speech is the attitudinal
intonation overlaid onto each clause, of which English has hundreds.
Neither can we show the body language, also part of the interactional
function of speech.
Natural language in the human brain
A recent article in Scientific American says biologists now believe
the specialization of the human brain’s two cerebral hemispheres was
already in place when vertebrates arose 500 million years ago, and that
"the left hemisphere originally seems to have focused in general on
controlling well-established patterns of behavior; the right specialized
in detecting and responding to unexpected stimuli. Both speech and
right-handedness may have evolved from a specialization for the control
of routine behavior. Face recognition and the processing of spatial
relations may trace their heritage to a need to sense predators
quickly."
I suspect the representational function of language is that which is
produced by the left hemisphere of the brain, and the interactional
function by the right hemisphere. Because the right side of the brain is
responsible for unexpected stimuli, from both friend and foe, then
perhaps interactional language in vertebrates began as body language and
facial expressions to denote conditions relevant to others, e.g. anger,
fear, affection, humidity, rain, danger, etc. Later, vocal sounds arose
as the voice box developed in various species, and in humans,
increasingly complex sounds became possible. The left side of the brain
is responsible for dealing with regular behavior, and so allowed people
to use their right hand to make sign language to communicate.
Chimpanzees and gorillas use their right hands to communicate with each
other, often in gestures that also incorporate the head and mouth. The
article hypothesizes that the evolution of the syllable in humans
triggered the ability to form sentences describing processes involving
people, things, places, times, etc. Proto-representational language was
probably a series of one-syllable sounds similar to what some chimps can
do nowadays with sign language, e.g. "Cat eat son night". Later, these
two separate functions of natural language intertwined onto human
speech.
Programming language structure
When looking at programming languages, we can see the
representational function easily. It maps closely to that for natural
languages. The process is like a function, and the actor, goal,
recipient, and other entities in the transitive structure of natural
language are like the function parameters. In the object-oriented
paradigm, one entity, the actor, is like the object. The circumstances
are the surrounding static scope, and the relationships between
processes is the sequencing of statements. Of course, the semantic
domains of natural and programming languages are different: natural
languages talk about a wider variety of things, themselves more vague,
than programming languages. But the encoding systems are similar: the
functional and object-oriented paradigms became popular for programming
because between them it's easy for programmers to code about certain
aspects of things they use natural language to talk about. The example
in pseudocode:
Date("2003-8-29").events += {
def a = new Instances();
a[1] = jamesStrachan.officiallyAnnounce(Language.GROOVY);
a[1].effect = [some: s => s.rejoice(), others: o => o.tremble];
}The similarities between the interactional functions of natural and
programming languages is more difficult to comprehend. The major
complication is the extra participants in programming languages. In
natural language, one person speaks, maybe one, maybe more people
listen, perhaps immediately, perhaps later. Occasionally it's intended
someone overhears. In programming languages, one person writes. The
computer reads, but good programming practice is that other human people
read the code later. Commenting, use of whitespace, and variable naming
partly enable this interactional function. So does including test
scripts with code. Java/C#-style exception-handling enables
programmer-to-programmer interaction similar to the probability-modality
of English verbal phrases, e.g. will/definitely, should/probably,
might/could/possibly, won't, probably won't.
Many programming systems allow some interactional code to be
separated from the representational code. One way is using system-wide
aspects. A security aspect will control the pathway between various
humans and different functions of the program while it's running.
Aspects can control communication between the running program and
different facets of the computer equipment, e.g. a logging aspect comes
between the program and recording medium, a persistence aspect between
the program and some storage mechanism, an execution performance aspect
between the program and CPU, a concurrency aspect between the program
and many CPU's, a distribution aspect between the program and another
executing somewhere else. Here, we are considering these differents
facets of the computer equipment to be participants in the
communication, just like the programmer. Aspects can also split out code
for I/O actions and the program entry point, which are program-to-human
interactions. This can also be done by monads in "pure functional"
languages like Haskell. Representational function in Haskell is always
kept separate from interactional functions like I/O and program entry,
with monads enabling the intertwining between them. Monads also control
all access between the program and modifiable state in the computer,
another example of an interactional function.
Textual function of language
The textual function of language in SFG is that which concerns the
language medium itself. In spoken natural language, this is primarily
the sequential nature of voice, and in written language, the 2-D form of
the page. Whereas in natural language theory, the voice-carrying
atmosphere and the ink-carrying paper are obviously mediums and not
participants, it's more difficult to categorize the difference between
them in programming language theory. Because a program is written as
much for the CPU as for other human readers, if not more so, we could
call the CPU a participant. But then why can't the CPU cache, computer
memory, hard-disk storage, and comms lines also be called participants?
Perhaps the participants and the transmission medium for natural
languages are also more similar than different.
The textual function of language is made up of the thematic,
informational, and cohesive structures. Although mainly medium-oriented,
they also involve the participants. The thematic structure is
speaker-oriented, the informational structure is listener-oriented. The
thematic structure is overlaid onto the clause. In English, what the
speaker regards as the heading to what they're saying, the theme, is put
in first position. Not only clauses, but also sentences, speech acts,
written paragraphs, spoken discourses, and even entire novels have
themes. Some examples using lexical items James, to give, programmers,
Groovy, and 2003, with theme in italics:
James Strachan gave programmers Groovy in 2003. Programmers are who
James gave Groovy to in 2003. The Groovy Language is what James gave
programmers in 2003. 2003 is when James gave programmers Groovy. Given
was Groovy by James to programmers in 2003. In English, the Actor of the
representational function's transitive structure is most likely to be
separated from the interactional function's Subject and from the Theme
in a clause, than those from each other. I think the textual functions
of natural language are far more closely linked to the interactional
function than to the representational. Perhaps the right side of the
brain also processes for such texture structure.
The informational structure jumps from the top (i.e. semantic)
encoding level directly to the bottom (i.e. phonological) one in
English, skipping the middle (i.e. lexical/syntactic) level. This is
mirrored by how programming languages such as Python use the lexical
tokens to directly determine semantic meaning. In English, the speech is
broken into tone units, separated by short pauses. Each tone unit has
the stress on some part of it to indicate the new information. For
example, each of these sentences has a different informational meaning
(the bold indicates the stresses):
James gave programmers Groovy in 2003. James gave programmers the
Groovy Language in 2003. James gave programmers Groovy in 2003. James
gave programmers Groovy in 2003. James Strachan gave programmers Groovy
in 2003. Unlike the thematic structure, the informational structures
the tone unit by relating it to what has gone before, reflecting what
the speaker assumes is the status of the information in the mind of the
listener. The informational structure usually uses the same structure
used in the thematic, but needn't. English grammar allows the lexical
items to be arranged in any order to enable them to be broken up in any
combination into tone units. For example, these examples restructure the
clause so it can be divided into two tone units (shown by the comma),
each with its own stress, so two items of new information can be
introduced in one clause:
James gave Groovy to programmers, in 2003. As for Groovy, James gave
it to programmers in 2003. In 2003, James gave programmers Groovy.
Programming languages should follow the example of natural languages,
and allow developers to structure their code to show both thematic and
informational structure. The final textual function, the cohesive
structure enables links between clauses, using various techniques, such
as reference, pronouns, and conjunctions. Imperative programming
languages rely heavily on reference, i.e. temporary variables, but don't
use pronouns very much. Programming languages should also provide
developers with many pronouns.
Summary
Programming languages initially represented information in the same
way humans do, using transitive structures such as function calls,
joined by logical relationships such as blocks and class definitions.
Interactional aspects of code were initially intertwined, but could be
separated out using aspects and monads. Enabling different textual
structures in programs isn't very widespread, so far limited to
providing different views of an AST in an IDE, only occasionally
allowing "more than one way to do things" at the lexical level. When
used well, textual structures in code enable someone later on to more
easily read and understand the program.
In promoting the benefits of programming languages enabling different
textual structures, I think it's useful to narrow down to two primary
structures: the transitive and the thematic, as these two are easiest to
communicate to programmers. See my earlier thoughts on how a
programming language can enable more thematic variation. Programming
languages of the future should provide the same functions for
programmers that natural languages provide for humans.
And of course, I'm building Groovy 2.0, which will both enable
thematic variation in the language syntax/morphology, and supply a vast
vocabulary of Unicode tokens for names. The first iteraction will use
Groovy 1.x's SwingBuilder and ASTBuilder, along with my own Scala-based
combinator parsers, to turn Groovy 2.0 source into Groovy 1.x bytecode.
The accompanying Strach IME will enable programmers to enter the Unicode
tokens intuitively. Groovy 2.0 will break the chains of the the
Antlr/Eclipse syntactic bottleneck over Groovy 1.x !!!
1 January 2010Bust Groovy open. Set it free!Ever since the core
dynamicity and syntactic enhancements of Groovy 1.0 beta 1 over Java,
the Groovy Language has been adding functionality upon functionality.
The Groovy developers at Codehaus have taken one core technology that
benefits Java developers, i.e. the meta-object protocol, and used it as a
hook to hang on a closetful of their own versions of widely available
JVM-based technologies. Groovy 2.0 at Codeplex will be the version of
Groovy that strips away such functionality, aiming to provide developers
with the features of Groovy that benefit us the most. The bundled tools
will be dropped, as they duplicate functionality available in other JVM
languages. The Antlr-based lexer/parser will be removed, so we can
interact with the AST directly from other languages. The DGM (default
Groovy methods) will be stripped out, so Groovy AST users can instead
use the richer classes from languages like Scala.
Lately I've discovered the Groovy AST is inconsistent in its
functionality: while earlier-coded syntactic functionality is done above
the AST level, much of the more recently coded functionality that could
be done above the AST is instead done under its hood. In this way, as
well as by supplying an ASTBuilder that can only be used from within the
Groovy Language itself, the Codehaus cartel are tying programmers in to
all their own added cruft, so they can sell us the book and charge us
consulting fees down the line. I'll dig under the AST and scalpel out
any function that can be done in other languages. When Java 7 brings
closures, I'll totally replace Groovy's implementation with Java's.
During 2010, I intend to free the Groovy Language from its Codehaus
chains, to bust it open, to reveal its core essence, the kernel that
most benefits, so programmers can use it simply from other superior
JVM-based languages, so my own Strach IME and lexer/parser can use it,
providing developers with a terse grammar that uses all Unicode tokens
in its vocabulary. Beginning with Groovy 1.8 beta 1, I'll soon after
release a stripped-down version consisting only of the core essence, an
AST directly controlling the MOP, to provide the JVM's answer to
Microsoft's DLR, an AST all JVM language implementers can build a
dynamic language on top of. I'll develop a process so with each Groovy
release, I can quickly release that release's MOP as a standalone. The
Groovy Language Runtime will move from "open" source to open source. Set
Groovy free!
23 December 2009"Groovy 2010" coming ???Groovy 1.7 is out, "in time
for Christmas", and planning for v 1.8 has begun, including a new module
system.
Groovy 1.x project leader Guillaume Laforge says "we would like to
make a first beta of 1.8 in February or so, with a target final date for
the end of the year - we love Christmas gifts". Does that mean he's
going to rename it "Groovy 2010"? He once threatened to rebrand Groovy
1.6 as "GroovyX". Microsoft once tried that trick with Windows, but have
since reverted to numeric versioning. Let's hope the Codehaus
developers don't learn that lesson the hard way.
The new module system proposes putting Swing, XML, SQL, JMX, Beans,
etc into separate modules, but the core will still be a tangled ball of
many functionally different components. Everything that sits above the
AST could be separated out, to encourage developers to put their own
syntax on top of the AST. I'm experimenting with an alternative
lexer/parser, called the "Strach" component of Groovy 2.0, aiming for
greater tersity, yet retaining clarity. Experimentation is good for
programming languages.
Another separable component is the Default Groovy Methods (DGM).
These methods are compulsory in Groovy: if you want to use Groovy's
meta-object protocol (MOP), you must also use these methods. The
meta-object protocol allows programmers to add, and subsequently remove,
methods on the fly, but forces these default methods on us. What if we
just want to use the MOP, without the DGM? I'll be providing a
component, to be called the "Wilson" component, that gives the option
not to add those methods to classes, and will even let us hide default
Java methods.
Programmers aren't silly: Why can't we use the feature of the Groovy
Language that really benefits us, i.e. the MOP, without having other
cruft shoved on us as well, such as the syntax and DGM? Groovy 2.0 will
sit atop Groovy 1.7, giving more choices to programmers.
AppendixHere's the list of AST nodes used in Groovy 1.7, with
indenting showing implementation inheritance, that Strach will free up
for developers to use directly:
ASTNode
AnnotatedNode
ClassNode
InnerClassNode
InterfaceHelperClassNode
MixinNode
MethodNode
ConstructorNode
FieldNode
ImportNode
PackageNode
Parameter
PropertyNode
expr/Expression
expr/ConstantExpression
expr/AnnotationConstantExpression
expr/BinaryExpression
expr/DeclarationExpression
expr/TernaryExpression
expr/ElvisOperatorExpression
expr/BooleanExpression
expr/NotExpression
expr/TupleExpression
expr/ArgumentListExpression
expr/PropertyExpression
expr/AttributeExpression
expr/ListExpression
expr/ClosureListExpression
expr/MapExpression
expr/NamedArgumentListExpression
expr/ArrayExpression
expr/BitwiseNegationExpression
expr/CastExpression
expr/ClassExpression
expr/ClosureExpression
expr/ConstructorCallExpression
expr/EmptyExpression
expr/FieldExpression
expr/GStringExpression
expr/MapEntryExpression
expr/MethodCallExpression
expr/MethodPointerExpression
expr/PostfixExpression
expr/PrefixExpression
expr/RangeExpression
expr/RegexExpression
expr/SpreadExpression
expr/SpreadMapExpression
expr/StaticMethodCallExpression
expr/UnaryMinusExpression
expr/UnaryPlusExpression
expr/VariableExpression
stmt/Statement
stmt/AssertStatement
stmt/BlockStatement
stmt/BreakStatement
stmt/CaseStatement
stmt/CatchStatement
stmt/ContinueStatement
stmt/DoWhileStatement
stmt/EmptyStatement
stmt/ExpressionStatement
stmt/ForStatement
stmt/IfStatement
stmt/ReturnStatement
stmt/SwitchStatement
stmt/SynchronizedStatement
stmt/ThrowStatement
stmt/TryCatchStatement
stmt/WhileStatement
ModuleNode
GenericsType
AnnotationNode15 December 2009Try Groovy, or is it
try{Groovy}catch(Exception e){} ???Run this code in Groovy 1.6 beta 2 or
earlier:
try{def a= "abc"; println a}
try{def a= 123; println a+2}The result:
abc
125Now run it in Groovy 1.6 RC 2. The result:
org.codehaus.groovy.control.MultipleCompilationErrorsException:
startup failed: A try block must have at least one try or finally
block.A standalone try block is great for limiting the scope of common
temporary variables. Groovy enabled them in versions 1.0 and 1.5, a
great improvement over Java. Scala also enables such standalone try
blocks, even in the upcoming Scala 2.8.
But in this mailing list reply to me, Guillaume Laforge writes the prefered way is to use labelled blocks, i.e.
unreferencedUselessLabel: {def a= "abc"; println a}
unreferencedUselessLabel: {def a= 123; println a+2}What yukky
syntax! Perhaps a better change would have been to increase syntactic
elegance and tersity by eliminating parens when only one statement is in
the block, just like with if and while statements:
try println "abc"
try println 123 + 5Not only that, but in someone's eagerness to
restrict programmer choices, they didn't bother checking the error
message: "A try block must have at least one try or finally block." (To
be fair, the message has since been corrected for Groovy 1.7.)
As the Groovy Language Underwriter, my job is to be ready to continue
Groovy Language development should the developers at Codehaus abandon
it or change its name. However, they actually seem to be gradually
removing Groovy developers' choices through stealth, dumbing down the
syntax, especially now SpringSource and EMC/VMware are bankrolling
Groovy Language development, or perhaps bankrolling the lack of it.
Part of my mission in creating an alternative lexer/parser for the
Groovy AST is to bring back programmer choices when utilizing the Groovy
AST. To quote another freedom fighter, removing programmer power from
the Groovy Language syntax is something up with I will not put.
4 December 2009Groovy 2.0 status reportI grew up in Auckland New
Zealand, living there for 30 years, but never considered creating a
scripting language until I'd moved to Melbourne Australia 10 years ago.
There, I lived in the CBD, an 8-by-8 grid of blocks known as the Hoddle
Grid. I went for many walks while there, often thinking of that CBD as a
huge chessboard on which to play out games. (I had a passing interest
in chess when I was a kid.) The apartment I lived in was on a popular
block of the grid, being where the commercial, entertainment, and
recreational precincts meet. In the chessboard analogy, that block is
the starting position for the white king.
There, I ran a company, GroverServer Ltd, dayjobbing as a programmer
to raise funds, while working on a process to model company annual
reports as Access databases. Although I'd used Access many times since
its v1.0 release, when using it for a real-world complex business task,
it wasn't flexible enough. I eventually concluded the VBA scripting
language was nothing but a marketing con. The true genesis of Groovy
happened soon after this, during a trip I made to England in Feb 2002. I
decided a scripting language should run on a VM, Java's being the
leading one at the time. It should have a flexible syntax, certainly not
like VB's, and must enable AspectJ-style interceptions and
introductions. Of course, I never met Groovy Language creator James
Strachan while there. He began building Groovy 1.x soon after this, yet
another of his many open source projects, recruiting the developers who
now currently control it.
I stayed silent on the mailing list for a year, not wanting them to
change the name while they still could easily. But then came that
infamous day after DevCon2 in Dec 2005 when James left the development
team. Soon after, I posted my very first posting to the Groovy Language
mailing list. Six weeks later, Graeme Rocher changed the name of _Groovy
on Rails_ to Grails, but it was too late for them to change Groovy's
groovy name as well. By then I had learned enough about Groovy to
continue its development should the developers abandon it or change its
name: I had become Groovy's underwriter.
I decided to get more involved in the language, first by submitting
bug reports and change requests. My very first request was for a groupBy
method I'd found useful for munging data. Guillaume Laforge must have
also thought it was a good method to have, because he created his own
similar request a few months later with the same method name. After many
similar happenings, I began to realize the Codehaus Groovy developers
didn't want me around.
So I decided to branch off on my own, using the Groovy Language AST
as an engine to power a different configurable programming language
syntax and IME. While working on this, I discovered why statically-typed
languages are better than dynamically-typed ones for large systems. So I
switched to programming in Scala. At first, I thought I could use the
Scala AST as a target instead of Groovy's, but I now realise dynamicity
is essential in those few use cases that require it, so I'm back to
targeting the Groovy AST, using the Groovy 1.7 ASTBuilder, created by
Hamlet D'Arcy. For the static mode, I'll still have to target the Scala
AST, though the ideal solution is if Groovy added a static mode.
However, developer Alex Tkachman seems to be vetoed by project manager
Guillaume Laforge on this. Years ago, the Codehaus Roadmap for Groovy
3.0 was for it to be written in Groovy, which would have required a
static mode, but this idea seems to have been trashed.
I'll still be using Scala as the systems language in building Groovy
2.0. The Codeplex Groovy site will initially distribute "Groovy 1.7 with
Strach", where Strach is presently just the lexer/parser, written in
Scala, targetting the Groovy 1.7 AST. Eventually, it'll also target the
Scala 2.8 parse tree for the static mode (unless, of course, Alex
Tkachman succeeds in putting a statically-typed mode into Groovy). When
distributing Groovy 1.7 with Strach, I'll experiment with replacing
selected java-source classfiles from the Groovy 1.7 jar file with my own
scala-source ones. Given enough time, I could even manage to totally
rewrite Groovy in Scala using this process!
31 October 2009Strach IME and Groovy 2.0 making progressThe webpage
for the Strach IME has been created. The Groovy page at Codeplex has
been repurposed as the primary distro site for the Groovy Language 2.0
next year. Further details on each of those webpages.
27 September 2009Scala's groovy stairwayPaul Graham writes in Revenge
of the Nerds how Lisp and Fortran are the trunks of two separate
evolutionary trees in programming language evolution. He then lists
various features of Lisp which have been making their way into languages
in the Fortran language tree, including dynamic typing. Having tried
out many programming languages over the past few years, I now see
programming language evolution differently. Unlike Paul, I see dynamic
typing as being a lack of a feature: Static typing is the true feature.
Furthermore, Lisp macros can be thought of as a low-level feature
comparable to goto statements and pointers. All three can be abstracted
over with higher-level abstractions. Let's look at some abstracted-away
low-level features...
Gotos and breaks The original programming language was of course
assembly language. Assembly had the same basic features as machine code,
only a little more readable. We could branch to another part of the
code based on a data value: we could use this to implement conditionals
and looping. Algol enabled statements to be grouped into
statically-typed blocks. With all these, we could eliminate goto
statements. We could also store the program counter in a variable,
branch to some different code, then later return to the place we left
off: this is subroutine calling. Cobol implements this as a "GOSUB"
statement. Fortran enabled subroutine parameters; Algol enabled return
values; C and Lisp brought recursively-called subroutines; and Scheme
brought closures. C++ and Java implemented exception throwing, giving
better control flow. Scala, also having closures and exception-throwing,
eliminates break and continue keywords, being incompatible with
passed-around closured code, and in version 2.8, re-implements break
with exceptions.
Pointers and objects Cobol brought static typing. C enabled static
typing for pointed-at data. Simula and Smalltalk introduced objects.
Different inheritance models were tried out: C++ used multiple
inheritance, Java and C# used single implementation inheritance, Self
and JavaScript used the prototype model, while Ruby and Scala used the
flexible mixin model. Ruby also has open classes, at the cost of
eliminating static typing. Lisp, Ruby, and Java/C# had garbage
collection. By using objects everywhere, a language no longer needs
pointers.
Human interface 3rd generation languages enabled more meaningful
names, making code more readable, but longer. Fortran, C/C++, and
Java/C# brought operator precedences, eliminating parentheses, thus
shortening the code again. Scala simplified the rules for this. In Lisp
and Scala, statements are also expressions, returning a value.
Interactive Python's magic underscore is a simple way to pass a
statement value onwards. APL, and successors J and K, brought greater
tersity through a greater vocabulary of tokens. Matlab and R continue
along this way for math and stats. Perl enabled thematic variation,
bringing "more than one way to do it". Smalltalk was programmed in a
built-in visual environment, as was spreadsheets, etc, and IDE's, all
using color. Declarative paradigms, like Snobol, regexes, and Prolog
make a program more readable. Indenting was used by Cobol and extended
by Python. Haskell offers the choice of C-style or indent-style
bracketing.
Efficiency and concurrency Pure Lisp is very inefficient, but
nowadays different data structures are builtin. Numbers were always
hard-coded, direct-access arrays added later, and with Scala, even
objects that inherit are builtin types. Java brought threads for
concurrency, while Erlang and Scala brought the safer higher-level actor
model.
Macros and laziness Lisp enabled macros to control evaluation in
code. Scheme enabled lazy evaluation and Haskell made it compulsory,
eliminating much need for macros. Scala gives the choice of strict or
lazy evaluation, in a statically-typed language. Better compilers can
automatically detect and inline code that would normally require
programmer-control with macros. AspectJ-style aspects and Haskell-style
monads also allow code to be self-referenced and manipulated in a
program.
There's different tradeoffs between these feature sets, and creating a
programming language that combines them is difficult. ML, and
successors Haskell, Caml, and F# achieved this when combining static
typing with functional programming. OCaml and Scala successfullly
combined the object-oriented programming with functional.
IDE's generally build on the lexical structure of a programming
language. The Scala language compiler is designed as a stairway of
increasingly-higher abstractions. Near the top is the parse tree, one
step short of the lexical structure. I'm attempting to build on top of
this parse-tree layer of Scala 2.8. I want to add APL/J/K-style tersity
to the syntax, including enabling me to use a foreign language
(simplified Chinese) everywhere in my Scala code. When done, I'll
release this language as "Groovy 2.0".
9 September 2009, 9:09:09pmGroovy life and deathRick Dillon recently
posted this analysis of programming language evolution. The traditional
imperative languages like C/C++, and the newer ones like Java and C# are
statically typed, while the traditional functional languages like
Lisp/Scheme, and the newer semi-functional ones like Python, Ruby, and
Javascript are dynamically typed. He gives a code sample implementing
functional currying in statically-typed Java, which turns out to be
quite verbose because of the explicit static types. To put static typing
into functional programming, while retaining tersity, we require type
inference. ML/Caml and Haskell are examples of such languages, and
OCaml/F# and Scala are object-oriented language examples. When seen in
this way, dynamic languages are a deadend in programming language
evolution. Instead of maintaining the "systems language / scripting
language" duo, future language evolution will go along the "functional
language with inferred static typing" route.
I suspect many programmers coming to dynamic languages will follow
the same path I did in realizing this. They will typically work in Java,
C#, Cobol, PHP, and/or VB in their dayjobs. They'll discover Python or
Ruby, though for me it was Groovy with the nifty closures and
collections. At first, they'll just use it for scripty stuff, then start
trying to build bigger and bigger systems. They'll then realise the
lack of static typing means they've thrown out the baby with the
bathwater. The functional languages with inferred typing will then
beckon. Groovy programmers will start learning Scala because it runs on
the JVM. At first they'll think that Scala will only replace Java, so
they can use Groovy and Scala together, but eventually they'll see that
statically-typed functional languages can replace both members of the
"systems language / scripting language" duo! (Perhaps some will even
say, as I did, that using the Groovy Language started off being useful,
but "what began as life to me has now become death"!)
Lately, I've been trying to understand the interplay between
different features in these types of languages, such as monads, macros,
and mixins:
(1) Monads from Haskell enable computer languages to cleanly split
program code into functional-pure and side-effecting components.
Aspects, as in AspectJ and Spring, are frequently used in a system-wide
manner in non-functional-paradigm languages to separate out certain
non-paradigm concerns such as I/O, persistence, exception-handling,
optimization, etc, from the primary representational concern. This type
of separation between the representational and interactional functions
of a programming language mirrors that in natural language, as analyzed
in Hallidayan Systemic Functional Grammar theory.
(2) Syntactic macros can provide the most user-configurability at the
surface levels, as in Lisp/Scheme and Dylan. Most programming languages
provide much power in the engine, then deliberately bottleneck it for
the language syntax, only to return it to the programmer at the IDE
level. Natural languages don't do this, and I don't think computer
languages should, but a programming language syntax is considered a holy
grail for marketing the language, so not many languages have dared to
allow such syntactic configurability in the past. Perhaps this syntactic
component of programming languages mirrors the textual component of
natural language in Systemic Functional Grammar.
(3) Scala traits (i.e. mixins) provide a more flexible yet still
correct OOP system than either single or multiple inheritance. The Scala
website shows how they can be used to cleanly implement the Observer
pattern, the very pattern the AspectJ evangelists 10 yrs ago were saying
aspects could easily implement in the non-functional language Java.
No single statically-typed functional language provides all these
features, not that I yet understand them all, and how they relate to
each other. I do intend to return to creating a shell over the Scala
language parse tree once Scala 2.8 is out because I think this is the
best opportunity to evangelize full Unicode character set programming to
the world. The shell will be called GroovyScala.
21 August 2009A Groovy UndertakingThis mailing list reply from Jochen
Theodorou popped up on the Reddit-programming charts recently, probably
stage-managed damage-control. Jochen wrote: "James (Strachan) is great
in initiating projects and gets them to a state where the examples work.
But as soon as you go away from the examples and alter them just a tiny
bit, it fails." Yeah, that's called Test-driven development. The
solution: add more tests, then make them work! I'm a great believer in
it. Jochen also wrote: "...as an active part (James dumped Groovy) over
four years ago already. (...) You can say that current Groovy is
Guillaume (Laforge) and me mostly, but many people did come and go, some
did contribute a lot... like for example John Wilson, other did only
cover a small area." The Groovy developers seem to be positioning James
as only one of many Groovy Language "creators". What's up? Is someone
else hoping to stand in for language creator James in an upcoming Groovy
Language interview in Australian Computerworld?
Why did some developers "only cover a small area"? Perhaps they
started getting harrassed after surfacing on the Groovy mailing list, as
I did 3 yrs ago? At the time I thought it was just my name! I knew
anyone could've been doing it, but around that time two UK teachers at
my university took me to dinner and warned me that "anyone who takes on
Google comes off second best". I doubt it was Google who put them up to
that, and why would the Groovy FUD-spreaders do so? I suspect the real
reason the present developers took control of Groovy was to try and sell
it to Google as a brand name fit. But I didn't really understand how I
was a threat to them. Early the following year, when Groovy 1.0 was
finally released, the licence still clearly said: 4. Products derived
from this Software may not be called "groovy" nor may "groovy" appear in
their names without prior written permission of The Codehaus. "groovy"
is a registered trademark of The Codehaus. I was just piggybacking the
Groovy name because I thought it might be a good gimmick if I wanted to
return to programming work one day, not for any other reason.
Then one day I was fooling around online, and looked up the U.S.
Trademark database for the Groovy Language details. They weren't there!
They weren't even in the history of lapsed trademarks. Codehaus was a
US-based outfit, weren't they? Beta-1 of Groovy 1.1 was then released
with the Apache licence. Groovy's previous licence had only been a
bluff! But was that really a reason to harrass me? Other programming
languages don't trademark their names: I suspect if I changed my name to
Scalia Scalow and surfaced on the Scala mailing list, no-one there
would feel insecure enough to harass me because of my name. It seems
there's an essential difference between languages: Scala is a quality
language designed within academia, though intended for business, to
bring present Java and C# developers a little closer to functional
programming. Groovy is an adhoc commercial creation, designed to flip
the investing companies at a profit, first Bay Partners, then
SpringSource, and now VMware. I suspect it's that difference that makes
the Groovy Language developers ultra-picky about who's involved in the
development.
Jochen also wrote: "Developing a language is a lot of stress. You
have to discuss things on an emotional level very often. (...) And many
people get tired of these discussions, so did James and so did for
example John." Is he priming up the community for the next departure,
perhaps himself or Guillaume? Because I imagine VMware had more cash in
their "cash and stock" offer for SpringSource than did SpringSource in
theirs for G2One, perhaps Guillaume's suddenly lost some motivation to
continue with (J)Groovy development. And what about me? After 4 years of
working on an idea to make programming languages terser using all
Unicode tokens, staying in mainland China because it's the home of the
simplified Chinese characters and targeting the Groovy AST because of
its adhoc construction and its groovy name, I confess I'm also getting a
little tired of it all. I never really knew when I first got involved
what a truly dirty business open source software development is.
8 August 2009Groovy futuresThe (J)Groovy developers recently released
beta-1 of "Groovy 1.7". But will it really be called version 1.7 ? The
developers changed the name of Groovy 1.1 to 1.5 at the last moment, and
they might do it again with version 1.7. Besides plucking some stuff
out of Spock and ASM, they've begun on inner classes, the main feature
from Java still missing from (J)Groovy. And they've dusted off the
GroovyScriptEngine, rewriting it, probably as a snipe at my own
GroovyScript-branded version of Groovy for the Scala parse tree. I'm not
sure how many developer hours SpringSource threw at beta-1, but I
suspect not many. They need to keep up the appearance of developing the
Groovy Language, while continuing to collect consulting fees, to get a
high valuation in their talks with JBoss or whoever it is.
While "Groovy 2.0" has been talked about as the version of (J)Groovy
shipping with a new improved MOP, this current 1.7 line might end up
with that name, not because it has a new MOP but for marketing reasons
only. Perhaps they'll bring out new editions of their books. The Groovy
Language release schedule now seems to completely revolve around
marketing and training events. But where would the new MOP fit in? If
the developers finally manage to do what John Wilson couldn't, what
Groovy 1.x botched, in producing a Java-language compatible MOP for the
JVM, would SpringSource really want to prewrap it in a programming
language and tag it with the "Groovy" brand? I'd think they'd want to
pitch it as the JVM's answer to Microsoft's DLR, something like "the
Spring DLR for the JVM", and promote it for all JVM-based dynamic
languages.
But even without a new MOP, doesn't the Groovy Language 1.x still
have a future? It's certainly the language of choice for Grails. For
other use cases, such as scripting and testing, it may now be
superceded. Before Groovy, developers used JPython. Some pitched Groovy
as a better choice because it's Java-syntax compatible, but I don't
think they really understood the mindset of a typical corporate
programmer. Programmers want to expand their skillsets, so would rather
choose JRuby for scripting and testing because it's NOT Java-syntax
compatible. JRuby is curriculum vitae compatible for Java developers,
being another step up to a Rails job. And what of (J)Groovy's recent
push with Griffon? With Grails there was little serious competition for
Groovy, but with Griffon, Groovy is up against the might of JavaFX. The
recent trend of calling (J)Groovy "Groovy on Grails" may have hit the
mark.
As the Groovy Language underwriter, I often think about Groovy's
future, both the technology and the brand, and I'm now a little
pessimistic. A week after I said I was switching from C# to Scala/JVM
for programming to the Groovy Language AST, Groovy Language creator
James Strachan (by another total coincidence :-) bought and read the
Programming Scala book, and subsequently said he thought Scala was a
better choice than Groovy for systems programming. After programming in
Scala for a mere month, and that only part-time, I realized Scala is
already the language I was trying to modify the (J)Groovy AST to be.
There are still a few things Scala lacks, such as syntactic macros, but
I've no doubt they'll be coming in an upcoming version of Scala. What
programming I did do in Scala (i.e. build a combinator parsing library),
I later discovered a better version already existed in the Scala
libraries. Recently I started to doubt why Codehaus (J)Groovy/JVM and
Codeplex Groovy/DLR exist when I could target both platforms via the
Scala parse tree, but now I'm wondering why I'm programming at all?
With (J)Groovy being too minimal a wrapper for an upcoming
"SpringSource DLR for the JVM", with corporate developers prefering
JRuby to Groovy for scripty stuff, with Swing already being targeted
effectively by JavaFX, and with Scala becoming recognized as the best
choice for new systems programming for the JVM, the only role for Groovy
in the foreseeable future seems to be as Groovy 1.x for Grails. People
will probably just call it "the Grails language". Perhaps the only
future for the Groovy brand is as my middle name.
28 July 2009One Groovy Language to rule them allSome say that because
programmers read code far more than they write it, it's better for a
language to have a clean minimal syntax, so we can easily read code
others have written. But natural languages don't work that way.
Whatever our native language, we can read many more words than we use
when we write, and we can understand in listening many more words than
we usually speak. As we learn our native tongue as children, we hear
many varieties of it and much vocabulary, learn to understand it
quickly, but we seldom reproduce most of it. Even as adults, it doesn't
take long when listening to a new flavor and accent of English to
understand it, but we take much longer to reliably imitate it, if at
all.
I once spent a couple years studying natural language, then returned
to programming as a hobby. I'd never liked Perl: the "there's more than
one way to do it" philosophy had never appealed; I'd preferred the more
minimal syntax of Python or Smalltalk. But when I returned to
programming, what Larry Wall's been saying all these years began to make
sense. Programming language designers who restrict what the language
can do, providing only one way to do things, are like the grammar school
English teachers who try to prescribe to their students what correct
English is.
Mainframe programmers read lots of Cobol programs to understand their
meaning, but don't write much of it when maintaining programs. An
experienced Cobol programmer can flick through a printout and quickly
understand the program. Computer Science students read the C code in the
Unix kernel, but seldom change it. They read it so they can read C code
easily. We should be able to understand code written by others, not by
contraining what others can write, but by more experience in reading
what others have written.
But one programmer can only really read code easily in one or two
computer languages, just as most people can only learn one or two
natural languages really well. For this reason, programmers are
categorized by the language they program in.
The Groovy Language will solve this problem by being available for
every available AST. As well as the (J)Groovy flavor for the Groovy/JVM
AST, the GroovyScript flavor for the Scala parse tree, and the
Groovy-DLR flavor for Microsoft's DLR, the Groovy Language will
eventually be available for every AST platform. The original (J)Groovy
syntax was a close copy of Java's, while being semantically different;
Java's was a close copy of C++'s, also semantically different. So
someone who knew Java had a head start learning Groovy, and so on.
The Groovy Language will be the end-of-the-line for the C-syntax,
available for every practical AST, and so replacing other programmng
language syntaxes. Therefore, if someone learns (J)Groovy, they can then
switch to using Groovy-DLR easily, just as when someone learns British
English, they can switch to using Indian English easily. There will be
one Groovy Language to rule other computer language syntaxes. I guess
they'll eventually become obsolete.
27 July 2009What makes Scala groovy?Lately, I've been thinking about what makes Scala groovier than (J)Groovy...
(1) Nested classes and packages. I can nest my class definitions any
way I want when doodling, i.e. doing experimental programming. Groovy
DevCon 5 talked about nested classes for Groovy 1.7. Anonymous inner
classes aren't necessary, though, as closures can simulate them.
(2) Pattern matching. Pattern matching is an incremental addition to a
programming language that, once learnt, is hard to do without. Groovy
2.0 is slated to bring pattern matching, but some think the coming
Groovy 2.0 is a myth, just like JSR 241 and the Groovy language spec.
(3) Combinator parsing. People are bored with the limits of regexes,
and want more declarative power in parsing stuff. Scala now has a terse
combinator parsing syntax, and Scala 2.8 will introduce the more
efficient packrat parsing trait for them.
(4) Consistency of syntax and semantics, e.g. the method/field
uniform access principle, as opposed to the tack-on approach of
(J)Groovy, which is necessary to ensure seemless backwards-compatibility
with Java classes. Returning to Groovy coding after working with
Scala's "clean break with Java" design, though, is harder than returning
to Java coding after working with Groovy. Scala's operator/method and
parameter/indexing dualities are features that could successfully be put
into Groovy, though.
(5) Mixins/traits. The (J)Groovy 1.1 (betas) AST had empty stubs for
mixins, but the Groovy developers never implemented them. After seeing
how Scala traits could do things I thought were only elegant with
aspects, e.g. the Observer pattern, I now believe Groovy needs those
mixins.
(6) Inferred static typing. After working with this in an IDE, one
wonders how the "more tests are better than more typing" lie spread so
rapidly. Inferred static typing is "more typing (static) with less
typing (fingers on keyboard)".
What (J)Groovy features are groovier than Scala's?(1) Builders. I've
heard Ruby copied this feature from Groovy. Has it been done in Scala?
Scala's syntax already allows it, though I've yet to see a Scala
implementation of Groovy's HtmlBuilder. Scala's inline XML syntax is
ugly compared to builder-based syntax.
(2) GStrings. Also known as interpolated strings, they enable us to
do much commonly used string handling, e.g. printing, with a terser
syntax. Perhaps Scala's scalable syntax could enable these without
syntax changes, I don't know.
(3) Dynamic typing. Dynamic typing is useful in the 20% of code where
static typing isn't suitable. Just as dynamic Python enhances static C
code, dynamic Groovy enhances static Java code. Static languages can
emulate some dynamic typing features by typing everything with the
Object type, or using an expando object. Open classes that enable
inheritance may be impossible to emulate, though. (Groovy also enables
built-in static typing which is slower than its dynamic typing. Use this
feature for interface documentation only, use Java instead for other
static typing requirements.)
It seems Scala could copy (J)Groovy's groovy features far easier than
Groovy could copy Scala's. Although Groovy pitches itself as
"complementing, not competing with" Scala, since programming in Scala,
I've yet to find much that Groovy's a more obvious fit for. Scala's
tersity and inferred typing are addictive.
What could make both Scala and (J)Groovy groovier?Self-mutating
syntax. This would enable syntactic macros and keyword aliasing, thus
putting Scala into the realm of Lisp/Scheme. Some of Scala's syntax
looks like it could be redefined as a syntactic macro, e.g. the for
comprehension could generate the underlying calls to map, filter, etc.
After pulling out these types of simplifications, perhaps Scala's
remaining syntax would be easily handled by a library based on Scala's
own parser combinators, making Scala syntax self-referential.
The GroovyScript source code I've posted enables annotations to
define lexical and syntactic features of a C-syntax language such as
(J)Groovy or Scala, perhaps another way of making it self-referential.
It requires using monadic bind and return/value parser combinators,
making it a "context-sensitive" grammar. Packrat parsing can do
context-free parsing in linear time, though with the cost of memory
space: can multicores keep such context-sensitive parsing tractable as
well?
22 July 2009Scala eclipses (J)GroovyI've posted beta-2 of
GroovyScript. GStrings are now parsing. The parser, written in Scala,
uses a pushback lexer so lexical tokens can be defined in the parsed
syntax using annotations. What's there may be useful for someone to see
how a parser with a pushback lexer can work. However, I'm now looking at
whether I can rewrite it as an extension to Scala's built-in combinator
parser library, so don't expect anything more for a while.
The more I program in Scala, the more convinced I become that it's
the grooviest Groovy Language of all. Scala's lexing and syntax needs to
be more customizable, though, which is what GroovyScript's all about,
adding an alternative lexer/parser to the Scala parse tree, to make the
syntax self-referential, thus enabling syntactic macros and keyword
aliasing. So a few weeks ago, I decided to switch the primary reference
implementation for the Groovy Language from (J)Groovy to GroovyScript.
I've still got a lot of learning and work to do though. Just as I
programmed in Groovy for a year before surfacing on their mailing list,
it will probably take that long or longer before I have much to
contribute to Scala.
9 July 2009The grooviest Groovy of all!I've posted beta-1 of
GroovyScript online. It's a lexer and parser with just-in-time pushback
lexing, with an Apache licence, written in Scala. When the parser
backtracks, it pushes unused tokens back into the lexer. Hence we can
write a lexer/parser that enables lexical definitions to be defined as
annotations using regexes in the parsed code. The following code snippet
parses correctly in beta-1:
abc;
@AddComment('//[^\r\n]*') try{
defg; //hi!!!
987;
@Anno try{
@DoIt(7, 'abc',) zyx;
16.8
};
'bcdefg';
};
@Anno @Letter hijk;
lmnop;The @AddComment annotation enables //-comments to be
recognized as whitespace within its tagged statement, and eventually all
external files parsed from within, but not before or after the tagged
statement. That's all that's working for now, but I'll eventually put in
everything I blogged about in my last post, e.g. custom lexing rules,
syntactic macros, name aliasing, a Unicode IME. Scala certainly proved
its worth for this challenging exercise, making me think about the best
path forward for GroovyScript, the 3rd language in the Groovy
Programming Language family, after (J)Groovy and Groovy-DLR. As a
result, I'm switching the GroovyScript target platform from the
(J)Groovy AST to the Scala 2.8 parse tree.
Why? It happened like this... About a year ago, I started converting
some hard-to-debug lexer/parser code written in Groovy to C#, just to
code it anew somewhere, hoping to debug the logic. The Visual Studio
editor complained about the static types not matching. I fiddled it so
the types matched, then discovered I had also debugged the logic
problem. That was when I started to reconsider the supposed benefits of
dynamically typed languages. Static typing Java-style is verbose,
though, but with type inference it rocks! C# has some type inference,
but Scala's is incredible! I now seriously doubt the benefits of dynamic
typing over its costs.
I started off building Groovier for the GrAST in Scala, but
discovered Scala itself was already the grooviest language of all! First
came C, then C++, then Java which should be called C3+, because after
that C# came along, the sharp symbol (#) being 4 plus signs (+) joined
together, which then makes Scala be C5+. Because I'm now more impressed
with the Scala language engine than with (J)Groovy's, I've decided to
switch GroovyScript's target platform from the GrAST to the Scala 2.8
parse tree. With inferred static typing, it's at a higher level of
abstraction than the GrAST. And unlike the GrAST, I can bundle it with
GroovyScript because its name is different.
But not only that, Scala's also multi-platform, running on both the
JVM and the CLR. Are Codehaus (J)Groovy or Codeplex Groovy-DLR really
needed? So as the underwriter for the Groovy Language, I'm also
switching from (J)Groovy to GroovyScript as the primary reference
implementation for the Groovy Language. The Scala language engine is now
the primary platform for the Groovy Language. (J)Groovy was the first
language in the Groovy Language family, but GroovyScript will soon be
the leading-edge one. GroovyScript will then change its name to Groovy
2.0. Because it's now Apache-licensed, the (J)Groovy developers could
adapt it to the GrAST and bundle it with (J)Groovy if they really wanted
to.
16 June 2009Gr8 isn't great, it gratesA few short months after my
very first posting to the Groovy Language mailing list, Graeme Rocher
changed the name of Groovy on Rails to Grails. It was too late to change
Groovy's groovy name as well, but I suspect the Groovy developers will
do so by stealth for version 2. As the underwriter of the Groovy
Language, I must ensure Groovy's development continues, and do so under
its present name.
The Groovy developers recently created the "Gr8 family of
technologies" brand, i.e. Groovy / Grails / Griffon / Gant / etc (see
http://twitter.com/aalmiray/status/1906155191) in direct response to my
blog post at
http://gavingrover.blogspot.com/2008/11/groovy-language-family.html. I
suspect "Gr8" is also their upcoming name for the dynamic language
engine inside Groovy 2.0, to compete with Google's V8 engine inside
Chrome JavaScript. Of course, the "Gr8 dynamic language engine" would
soon after become an engine for all JVM-based dynamic programming
languages, itself a good idea, but the SpringSource developers might
then quietly ignore Groovy 2.0 support in favor of other languages
running on the Gr8 engine. Like the 5 yr old JSR at
http://www.jcp.org/en/jsr/detail?id=241, Groovy 2.0 would become a
carcass, its only purpose to prevent anyone else using the brand.
GroovyScript will be a GPL-licensed lexer/parser for the language
engine inside the Groovy 2.0 Language. If that engine changes its name,
GroovyScript will then be allowed to bundle the engine as part of its
distro.
See http://gavingrover.blogspot.com for more details on the Groovy Language.
from http://gavingrover.blogspot.com