Platinum Sponsors

SUN

ELCA

Gold Sponsors

AdNovum

Credit-Suisse

Silver Sponsors

SAP

SyBOR AG

Partners

Netcetera AG

JUGS - Java User Group Switzerland

Stadt Zürich Wirtschaftsförderung

cR Kommunikation

Eveni AG

LiSoG - Linux Solutions Group e.V.

Star Alliance

ICTnet

simsa

Creatronic Media Supply

Media Partners

Netzwoche

inside-it.ch

javamagazine

InfoWeek

IT Reseller

JavaSPECTRUM

APRESS

Daniel Prusa

Daniel Prusa

Jan Jancura

Jan Jancura

(442) Project Schliemann: Generic Support for Integration of Programming and Scripting Languages into NetBeans IDE

Peer-Refereed Talk

Tuesday, 2007-06-26, 15:50 - 16:30, Arena 7

Topics

Download the presentation

Abstract

*Motivation*

Nowadays, it is a common trend to have a support for many file types within one 
IDE. IDE's are able to recognize and handle files with structure like .html, 
.xml, .sh, .bat, etc. They also quite often contain support for not one, but 
several programming languages. A substantial contribution is surely given by the 
growing number of scripting languages connected to web technologies.

In the following text, "language" refers to a set of all structured 
files of one type. To provide a hand-coded support for several languages into an 
IDE can be a problematic task from several points of view - the important 
factors are time of development, maintenance, performance (scalability), memory 
requirements. If we would like to support about 100 languages, an easy, generic 
framework of implementing a language, covering the editing features like syntax 
coloring or code folding (and many more) would be a wise solution. This is 
exactly what the project Schliemann offers.


*Project Schliemann*

The project Schliemann comprises of an engine that provides a generic framework 
for a language definition and its integration into NetBeans IDE. The support 
concerns mostly editing and visualization features. Besides generic features, 
custom features can be implemented on the top of the engine output for a 
particular language.

The engine has been inspired by the support which is present in many programming 
editors like Emacs, vi or JEdit. However, these editors typically implement 
basic features only (syntax coloring, indentation, code folding). There is 
usually a proprietary way how to define lexical analysis, but syntax analysis is 
missing. The goal of the project Schliemann is to go far beyond this approach 
and to offer many more possibilities. On the other hand, the ambition of the 
project is not to provide a framework for a complete programming language 
support, including compile/debug/run ability.


*NBS Language*

To integrate a particular language using the Schliemann engine requires to 
describe the language by so called NetBeans Schliemann (NBS) file which is a 
text file consisting of sections. The structure of a language is defined by the 
lexical section comprising of regular expressions that determine tokens of the 
language, and the syntactic section, which contains grammar productions. The 
other sections define the language visualization.

Common syntax is used for regular expressions and grammar productions in NBS 
files. The form of grammar productions follows the extended Backus-Naur form. 
LL(k) grammars are allowed. Regular expressions are enriched by states 
definitions that simplify tokens description. In addition, both analyses have a 
possibility to call Java code that handles a portion of the analysis, returns 
detected tokens, resp. derivation subtree and passes control back to the engine. 
This mechanism allows, e.g., to handle languages which of tokens cannot be 
described by regular expressions (like Ruby or JavaScript).

Features are defined based on recognized tokens and also grammar's 
non-terminals. For example, in a programming language, syntax coloring can be 
defined for a keyword (which is a token detected by the lexical analysis) as 
well as for a method name (a non-terminal detected by the syntactic analysis).

Format of a feature definition is intuitive and easily readable. To demonstrate 
this, let us show two fragments of a NBS file:

TOKEN:number:( ["0"-"9"] | ["1"-"9"] 
["0"-"9"]* )

COLOR:number {
    foreground-color:"orange";
    font-type:"bold";
}

The first line defines token 'number' by the given regular expression. The 
second part defines coloring for this token, foreground color and font type are 
specified within this definition.

The rich nature of features is boosted by providing a Java code as a part of 
their definition. The code is required each time, when a language specific 
behavior that cannot be described by a common pattern has to be implemented. A 
good example is feature Hyperlink. A hyperlink is defined by a token or 
non-terminal on which it can be enabled. The action to be performed on clicking
the hypelink is specified by a static method, which is the action performer. It 
is referenced in the hyperlink definition.


*Supported Features*

We give a list of the most important generic features, together with their brief 
descriptions.

- Syntax coloring: to distinguish tokens and possibly non-terminals of the 
language by a color in editor.

- Code folding: to wrap and unwrap pieces of code (e.g. methods) in editor.

- Navigation: to browse logical elements of the language in Navigator window.

- Imports: to import another language into a given language. This allows to 
implement languages embedding.

- Code completion: to offer how to complete a piece of code based on the typed 
prefix.

- Brace matching: for a bracket located under the cursor, to highlight the 
pairwise bracket.

- Actions: to define language specific actions over documents.

- Tooltips: to display tooltips on elements of the language.

- Hyperlinks: to implement language element driven jumps into a logically binded 
part of a document (the same or a different one).

- Indentation: to properly indent documents based on the language structure.

- Annotations: to annotate specific lines of documents (e.g. error lines).

Except the generic features, it is possible to implement custom features based 
on the output of the lexical and syntactic analysis.


*Notes on Engine Implementation*

The engine contains a general parser for LL(k) grammars. We have decided to have 
our own implementation to meet our requirements on the parser input, output, 
error recovery, internal architecture, grammar correctness checking, performance 
tuning, etc. Produced abstract syntax trees include information on comments, 
whitespaces and positions. One of the important features the engine's internal 
architecture supports is the languages embedding.

The lexical analysis can be connected to the incremental analyzer provided by 
NetBeans editor module. The engine has also its own analyzer, but it is not 
incremental.

Languages are defined in separated modules. They are detected by the engine 
using a NetBeans specific way.


*Project Status*

Currently, the Schliemann engine is implemented in the development builds of 
upcoming NetBeans version (6.0). We have proved the concept works well. Over 20 
languages have been already integrated into NetBeans using the engine. These 
languages include JavaScript and Ruby, where full, grammar based support has 
been done. This can be considered as the main achievement. As for the other 
languages, it is worth mentioning php, groovy, bat files, shell scripts, css, 
fortran, cobol and several NetBeans specific file types. For now, the 
integration of these languages is based on the lexical analysis only.

Provided that a grammar is available for a language, we have proved that to 
implement all the supported features is really an easy task which can be 
completed by one person during a week. Of course, adequate knowledge of the 
theory of formal languages is required to do this. As for the syntactic part of 
a language definition, an existing grammar that fulfills LL(k) criterion can
be adopted. If the criterion is not fulfilled, the grammar can be still adopted 
after some modifications.

Performance of the engine is quite good - it scales over the number of 
integrated languages as well as over the length of a document. We can conclude 
that the achieved results are promising. We plan to continue on integrating more 
languages into NetBeans, and also to extend the engine's capabilities (e.g. to  
support LR grammars, etc.).