Overview
Were you linked here, but not sure where to start? Maybe try a quick reference guide:
- Adding a new input language to VerCors
- Add a new type of specification to VerCors
- Making VerCors do something other than verifying files
Introduction
VerCors is a tool that tries to show programs to be correct. Correct may mean that the program finishes, that it does not crash, that it never throws an exception, or returns the correct result subject to some specification.
VerCors accepts programs in a variety of languages. For each language (also: frontend) the grammar is altered slightly, maintaining compatiblity with the source langauge grammar, so that we may add various annotations to the program. These annotations aid in verifying the program by specifying behaviour about it. This is crucial in automating the proof of the program.
As a tool VerCors works by passing the program through a large number of transformations. In general each transformation reduces the number of features in the program. You may think about features as different kinds of statements and operators. Various concerns are maintained in each element of the transformation. The most important guarded invariant is that of error transformations. As statement \(P\) is compiled into a different statement \(Q\), we have to remember how to translate errors about statement \(Q\) back into errors about statement \(P\).
An intermediate program is stored in VerCors as an Abstract Syntax Tree (short: AST). The internal language is also referred to as the Common Object Language (short: COL). Each part of the transformation is structured as a Rewriter, which visits each element in the AST. After visiting a node in the tree, the rewriter must specify what that node is replaced with. By default, the rewriter will recurse further into the tree, keeping the kind of node the same. For example, if we ask a default rewriter what \(p + q\) must be rewritten to, we will then ask what \(p\) and \(q\) will rewrite to. Supposing they rewrite to \(p'\) and \(q'\) respectively, the rewriter will by default restore the plus operator and return \(p' + q'\).
Around the transformation chain there are several steps. Unfortunately, syntax trees do not fall out of thin air, so we must first parse our text input into a tree. In VerCors, this is largely handled by by the ANTLR parser generator. We translate parse trees into COL trees as a part of the parsing stage, but there are no surprises there: as a matter of principle there is a corresponding COL node for each parse node. Next there is resolution: the process of associating each name in the tree with a referrable node. For example, variable names 'point' to the declaration of the variable. This stage usually presupposes that the file compiles at all.
After all the transformations we have a very simple program. On the other hand, proving it to be correct is no simple matter. We delegate this responsibility to Viper, a tool designed to automatically verify this dialect of program. Finally, a (hopefully empty) list of errors from Viper is translated into appropriate errors for the initial input to VerCors.
This completes the intended cycle of usage for VerCors: submitting a program to VerCors leads to a list of suggested improvements, which leads to editing the program, which leads to a new submission to VerCors. We have some qualitative wishes here: errors should be clear, specific and not a false negative. It should include context where appropriate, such as an example of how the error might occur. Ideally, VerCors is performant and can quickly establish whether or not there is an error.
The central goal is that we can encourage writers of critical software to prove their programs to be correct, so that their software is reliable.
Quick Reference Guide
Were you linked here, but not sure where to start? Maybe try a quick reference guide:
- Adding a new input language to VerCors
- Add a new type of specification to VerCors
- Making VerCors do something other than verifying files
Adding a Language
Adding a new language (also: frontend) to VerCors is a large task. To start, consider the following list:
- Add a language and extension mapping in
vct.main.stages.Parsing.Language
andvct.options.types.ReadLanguage
- Add a grammar for your language.
- Our default stance is to parse languages ourselves with ANTLR v4. See if yours is available here.
- Place your lexer and parser as
LangXLexer.g4
andLangXParser.g4
insrc/parsers/antlr4
. LangXLexer.g4
should importSpecLexer.g4
. Make sensible lexer rules to define specifications inside comments. See the bottom ofLangJavaLexer
for an example.- Make
XParser.g4
which importsSpecParser
andLangXParser
. This should connect the specification and language by defining the missing grammar rules. SeeJavaParser.g4
for an example. - Change
LangXParser.g4
to include specification constructs like contracts,with
/then
specifications, and more. All specification rules start withval
, so try searching forval
inLangJavaParser.g4
.
- Implement a parser in
src/parsers/vct/parsers/parser/ColXParser
that extendsAntlrParser
. - Implement a converter in
src/parser/vct/parsers/transform/XToCol
that extendsToCol
. The general process is to start with the root rule of your grammar, match its alternatives, and implement furtherconvert
methods as you encounter parse rules. Anything that is not supported can be put aside by calling??(rule)
. - Connect your
Parsing.Language
andColXParser
parser invct.main.stages.Parsing.run
.
Congrats, you added a frontend! Most likely the real work only starts from here, though. Consider the following:
- If your language adds features that did not exist in VerCors before, consider adding them to the "core" of COL by adding a nice abstraction of your new language feature. For example, we started supporting GPU kernels, for which we added an abstract
ParBlock
. - Your converter should contain no logic. Instead, implement any logic prefacing the normal rewriter chain in a language-specific rewriter. These rules are called in
LangSpecificToCol
, then deferred to specific rewriters likeLangXToCol
. - When possible, prefer simple logic in
LangSpecificToCol
connecting to a nice abstraction in COL. It is helpful to future developers of VerCors that the rewriting logic occurs in the main rewriting chain when possible. - Nodes are cheap! Instead of bolting on new behaviour to existing nodes, just make your own.
- Add rewriters in
vct.rewrite
that compile away your new features to existing features.- They should be added somewhere in the rewrite chain of
vct.main.stages.SilverTransformation
. - Prefer writing rewriters that do not care where they are in the chain.
- They should be added somewhere in the rewrite chain of
- Unexpected results? Try using flags like
--output-after-pass
,--trace-col
,--backend-file-base
. - Unacceptably slow parsing? See here.
New specifications
To add a specification to VerCors, you should consider whether your specification is language-specific, or whether it can be implemented as a language-agnostic feature.
For language-agnostic specifications:
- Add grammar rules to
SpecParser.g4
- Add new keywords in
SpecLexer.g4
. Consider prefixing your keyword with a backslash. - Add translation logic in all instances of
vct.parsers.XToCol
. An unfortunate fact is that the specification translation is kept the same between languages by copy-pasting all specification translation manually. This is because the specification grammar types are different by frontend, but structurally identical. - Nodes are cheap! Instead of bolting on new behaviour to existing nodes, just make your own.
- Add rewriters in
vct.rewrite
that compile away your new features to existing features.- They should be added somewhere in the rewrite chain of
vct.main.stages.SilverTransformation
. - Prefer writing rewriters that do not care where they are in the chain.
- They should be added somewhere in the rewrite chain of
We have few specifications that are language-specific on the parsing level. A notable exception is valClassDeclaration
, which contains rules for declarations that only make sense when they belong to a Java-style class.
New modes
The different tools that are built with or on top of VerCors, such as Alpinist, VeSUV and VeyMont, are internally implemented as modes of VerCors. To start:
- Add a mode in
vct.options.types.Mode
- Add a flag that sets your mode in
vct.options.Options
- Also add any additional mode-specific options under your flag as children of that flag.
- Match your mode in
vct.main.Main.runMode
- The outer shell of your mode should typically be an assembly of
Stages
, which is helpful in progress printing. - Try to re-use stages in vct.main.stages if possible, otherwise implement your own logic as a
hre.stages.Stage
.
Techical Setup
This chapter describes some of the technical setup around vercors, like version control, programming language choice, and libraries. It can be referred back to when evaluating changes to the setup.
Workflow
Since VerCors is made by Real Humans™ we agreed on some administrative conventions to stick by:
- If you have a bug or clearly delineated feature request, it should exist as an issue on our tracker on GitHub: https://github.com/utwente-fmt/vercors/issues. The goal is to have zero issues: that should be achievable, since it should be clear what to do from the issue.
- If you have a more vague feature request, or a question, or you're not sure if something is a bug, it should be a discussion: https://github.com/utwente-fmt/vercors/discussions.
- If you're working on VerCors code longer term, we prefer that you do so in a branch in the main repository, rather than working in a fork. Please ask for push access when starting out.
- If you're working on a single issue, please assign yourself the issue.
- We use feature branches and pull requests, so once you're done please file a pull request. Contiuous integration will run all examples on your changes automatically, so it's wise to check if it succeeded.
- We do not have a linting policy at this time. We ask that try to follow local conventions of files you're editing and do not reformat whole files.
Language
VerCors is a JVM project, and the current main language we use is Scala. We do accept contributions in both Scala and Java, but we highly encourage you to learn some Scala before contributing.
Since the main backend of VerCors is currently Viper, we enjoy the benefit of calling into it directly and being able to debug it directly, since VerCors is also a JVM application.
We primarily develop VerCors in Scala for several reasons:
- Scala has great support for functional programming paradigms. VerCors primarily deals with immutable data-structures and pushes out effects, like logging verification failures and writing out files.
- Scala supports pattern matching, which is very suitable for analysing ASTs.
- Scala supports metaprogramming, which is very handy for generating functionality that recurses into the ASTs: rewriters, comparators, etc.
- Other nice to haves: good collection api (sequences, sets), optional structural equality, sealed types.
On the surface it seems Kotlin has significant feature overlap with Scala, but some of the above points are not available and are important. We may revisit that stance in the future.
Build System
The default build system for Scala, and hybrid Scala/Java project, is SBT. However, our current build system is mill. For a general overview of why we don't use SBT anymore, see here.
For us specifically we considered these things:
- For meta-build alterations in ColHelper, it was annoying to remember that you need to reload.
- You needed to alter the IntelliJ settings immediately, otherwise the project does not build by default.
- The run script broke for almost every user, because we use a custom classpath caching solution, because sbt "run example.pvl" starts up way too slow (~7 seconds when vercors is already entirely compiled).
- Most people had encountered a situation where the project just breaks, and you need to re-import everything.
- Anything custom was hard to implement, because we didn't understand enough about how sbt works.
Mill on the other hand does not have these drawbacks:
- It is almost just scala.
- Tasks are a straightforward and easy abstraction:
- Custom stuff is easy to implement, even when interacting with mill internals.
- Caching is essentially free due to the data model.
- Tooling around mill is easy, because the hierarchy in the build definition corresponds to the hierarchy in
out
. - You can chain tasks entirely as you like and mill is fine with it: e.g.
col.generatedSources
depends oncolMeta.meta.runClasspath
.
- Mill boots with the vercors build in under one second, e.g.
time ./mill -j 0 vercors.runScript
->real 0m0,753s
- This means we can finally get rid of our own classpath abstraction: we can be sure vercors runs normally every time.
- Mill seems to work well with IntelliJ via the BSP integration, which does not need any altered settings in IntelliJ.
Common Object Language
The Common Object Language (short: COL) is the intermediate representation of VerCors. The main component of VerCors is to apply a sequence of transformations (rewriters) to the COL tree until it is suitable to submit to a backend. This chapter describes concepts around the structure of COL.
Nodes
COL programs are stored in a tree-like immutable data structure. All nodes are defined in src/col/vct/col/ast/Node.scala
. Other than some primitive types and collection types like Int
, String
, Seq[_]
and tuples (_, _)
, all types in the tree are descendants of Node
.
Nodes are split into two kinds:
- Descendants of
NodeFamily
. These are regular nodes, like expressions and statements.- They have structural equality, meaning comparing them constitutes comparing their fields.
- It makes sense to say that each family of nodes should rewrite to itself: one
Statement
should always be rewritten to anotherStatement
.
- Descendants of
Declaration
. These are nodes to which we can refer.- They have reference equality, which means a
new Variable(TInt) != new Variable(TInt)
. - It make sense to ask in which scope the declaration can be referred to. For example, a reference to an argument makes no sense outside the method it is declared in.
- We should be able to rewrite a declaration to zero, one or more successors of itself.
- They have reference equality, which means a
We will not exhaustively describe all kinds of node here, but we highlight the ones that have special Properties.
Expr
All expressions have a type:
trait ExprImpl[G] { this: Expr[G] =>
def t: Type[G]
}
You may presume that the node is check
ed before its type is queried, and may crash otherwise. The type of expressions is closely related to coercion.
Type
The relations between types must be defined in Types.leastCommonSuperType
and CoercionUtils.getCoercion
.
References
The normal way to create references to a declaration in a program is by naming it. This creates a few obligations, for example:
- Declarations must have a consistent and unique name with regards to its scope
- References to a declaration must name an existing declaration
- Renaming declaration must consider all its references (Otherwise, we may silently refer to the wrong declaration)
- Moving declarations or references across scopes must consider what declaration we are pointing to (Otherwise, we may silently refer to the wrong declaration)
- Importing internal definitions and generating side conditions must generate fresh unique names (Otherwise, we may silently refer to the wrong declaration)
We had this approach in earlier versions of VerCors, leading to declarations with a lot of underscores, and VERCORS
/SYSTEM
/INTERNAL
prefixes. Even still this would probably be pretty manageable as long as the number of rewriters is not too big. As of writing we have 67 different rewriting steps, so this is no longer the case.
No names!
We have instead elected to get rid of names entirely, replacing them with explicit references instead. This is a lie, because the initial input to VerCors of course does contain names, but it is a convenient lie: in all other passes the names are not relevant to rewriters. For debug output the preferred name for declarations is stored nevertheless in the preferredName
of the origin of the declaration.
Tying the knot
For simple declaration types like variables, we need to do nothing special to rewrite them. We can think of the following order to rewrite them:
- Rewrite all the variables of a
Scope
- Rewrite the body of the
Scope
, which may or may not contain references to the variables.
We can then account which Pre
-variable is rewritten to which Post
-variable, and substitute references accordingly. Note that the variables of the Scope
may themselves contain references to variables (e.g. type variables), but we can argue that by the structure of the scopes they have already been rewritten.
This convenient structure manifestly does not work for e.g. method invocations. Consider this simple program:
void p() {
q();
}
void q() {
p();
}
There is no way to order the rewrites correctly here:
- We cannot name the succesor of
p
before rewritingp
- We cannot rewrite the body of
p
before knowing the successor ofq
- We cannot name the succesor of
q
before rewritingq
- We cannot rewrite the body of
q
before knowing the successor ofp
goto 1
The way we resolve this by making references lazy in general. We can still refer directly to declaration that we know of in context, such as temporary variables, by wrapping them in DirectRef
. If not, we supply the computation that will resolve the reference to LazyRef
. Importantly, this computation need not be valid immediately, it must be correct only after the Rewriter
.
By default, we store one-to-one succesors in a map. References then by default construct a LazyRef
that looks up the successor in this map. The actual mechanics of this are explained further in Resolution and Succession
Origins
The origin is a mandatory field in every node, and is the facility by which we store any and all metadata about the node. It arranges the preferred naming scheme of nodes that should have a name, it describes the context in which the node was created, and can remember what text range(s) in the input participated in making the node.
As it is metadata that is nominally not semantically relevant, it does not participate in the structural equality of nodes.
Naming
The name of a node is not relevant to rewriting logic, so it is included in the metadata. For simple source input, the preferred name is just the same as it is in the source. For generated nodes we can use PreferredName
and NamePrefix
to indicate the preferred name, or its shorthand Origin.where(name = ..., prefix = ...)
. We emphasize once more that the name has no semantic relevance: VerCors would be equally sound if all preferred names are x
.
The preferred name is currently used in two places:
vct.col.print.Namer
for debug output and some source transformation outputs (e.g. VeyMont);viper.api.transform.ColToSilver
for serialization into the Viper language.
Context
For debugging purposes we can add context to an origin. This is typically not necessary for reporting verification failures, as they are reported very close to the input origin. Nevertheless it is useful to know why a node exists, so we mostly add ad-hoc instances of LabelContext
, or its shorthand Origin.where(context = ...)
.
Source Context
Most origins contain at least one pair of ReadableOrigin
and PositionRange
. Often it is tempting to "just ask" what file position a node comes from, but it is not very often so straightforward to explain away the reason a node exists as such. Nevertheless it is useful to know what source code primarily contributed to the node.
Origin
extends HasContext
, which defines messageInContext
to render a message in the context it describes. This will usually contain the file name, as well as the fragment in the file that the origin refers to. There is also Origin.shortPosition
, which will attempt to locate an arbitrary file position in the Origin
- this is generally not advisable to use outside debugging.
Checking
In general we attempt to design the AST in such a way that it is not possible to represent invalid situations. When that is not possible, we implement structural checks on the AST when it is convenient.
The Ambiguous
pattern
To avoid duplicating code we implement a rigid structure first in a lot of cases, that does not yet connect to the frontends. An example of this is Location
, which has rigid alternatives per type of heap location. Since Perm(x.f, _)
simply contains an expression x.f
rather than something that is immediately identifiable as a location, we punch a hole in Location
named AmbiguousLocation
that can contain that expression for so long as it is not yet resolved. We then implement a rewriter that scans for the allowed forms of expression in AmbiguousLocation
and translates them to a proper location. If the form is not proper, we can simply throw a UserError
in the rewriter.
The alternative is that we implement check
for AmbiguousLocation
and ensure that it contains a correct location. It is not that clear when a location is valid. For example, a JavaDeref
can be resolved to a built-in like seqn.size
. That does not resolve to a correct location, but it is more convenient to wait until the JavaDeref
becomes a Size
, rather than needing to be aware of every possible resolution of JavaDeref
in AmbiguousLocation
.
Type checks
One dimension along which we cannot escape checking is that of type constraints. Every node that contains an expression defines whether it imposes a type constraint on it in CoercingRewriter
. This coercion is computed in the implementation of check
for every node.
Structural checks
As said before the amount of structural checks implemented in VerCors currently is very limited. Nevertheless we are still missing some important ones at time of writing, so we should look into this more.
The ones that do exist are implemented by extending CheckError
, and then overriding check
in the Impl-trait of the node you want to implement a check for. Note that it is typically wanted behaviour that the super
-implementation of check
is included in the result.
Coercion
Various nodes contain expressions, such as other expressions (e.g. Plus
contains its two arguments) and non-expressions (e.g. Branch
contains the conditions for its branches). Each expression position in the tree may induce a typing constraint on the expression in it. For example, the conditions in Branch
have to be booleans:
case Branch(branches) =>
Branch(branches.map { case (cond, effect) => (bool(cond), effect) })
and the arguments of Plus
may be integers or rationals:
case Plus(left, right) =>
firstOk(e, s"Expected both operands to be numeric, but got ${left.t} and ${right.t}.",
Plus(int(left), int(right)),
Plus(rat(left), rat(right)),
)
Here we coerce each cond : Expr[G]
into a boolean, and do nothing with effect
. For Plus
we pick the first of two transformations that succeeds: either both left
and right
can be coerced with int
, or both can be coerced with rat
.
Coercions serve two purposes: they check that the AST is well-typed, and can be used in Rewriter
s to enact transformations on the AST.
Type checking
The coercions in CoercingRewriter
are run for every node during a check
in the implementation of DeclarationImpl
and NodeFamilyImpl
(which together span all nodes). They use a special descendant of CoercingRewriter
called NopCoercingRewriter
that applies no transformtions along coercions.
All nodes that contain expressions should have their coercions defined in CoercingRewriter
. If the expression should be in a simple type (class), you can use coerce(expression : Expr[Pre], type : Type[Pre])
. Shorthands for several common types are defined, such as rat
, bool
and int
.
If your node can be interpreted in multiple ways, determined by the type of its arguments, you can use the firstOk
helper to select the first set of coercions that suceed on the arguments. Only if all alternatives fail an error is raised.
Transformations
It is occasionally useful to implement transformation along the description of a coercion. Coercions are described by the node family Coercion
. For example, coercing a seq<int>
into a seq<rational>
is described by CoerceMapSeq(CoerceIntRat(), TInt(), TRational())
. Rewriters that extend CoercingRewriter
have access to this coercion object by overriding applyCoercion
.
One example of this behaviour is in the overloaded interpretation of null
. In VerCors the array types are entirely separate from the class types. Thus, null
can be coerced into either an array or a class type. In the rewriter that encodes arrays, ImportArray
, we filter for usages of null
for arrays by translating a CoerceNullArray
coercion to the appropriate value for a null array.
Safety
Note that very nearly all defined/permissible coercions are promoting coercions. That is, they only admit coercions when a value definitely fits into the target type of the coercion. This is defined programatically by defining isPromoting
in CoercionImpl
. Currently the only exception is rationals into zfrac
into frac
. We have not investigated yet whether it is a good idea to define more implicit coercions to represent situations like demoting a uint64
into a uint32
- perhaps they are better suited as explicit non-ApplyCoercion
nodes at an early point. Failures in coercion are reported to the blame in Program
, mostly because we have no other good place to put it.
The other safety concern related to promoting coercions is that sets only map over promoting coercions due to the fact that the default coercion is defined as \(x \in \textsf{input} \leftrightarrow \textsf{coerce}(x) \in \textsf{output}\). Alternative encodings we investigated are cumbersome because they add additional necessary proof steps.
Pretty-Printing
The pretty printer of VerCors is derived from "A prettier printer" by Philip Wadler, accessible here. Typesetting a COL AST is done in three stages: it is first translated into a document tree, then a list of elements, then text.
COL
Each node implements layout
, which is externally accessible via show
. The only purpose of show
is to wrap the document tree with NodeDoc
, so that we may later recall what document tree corresponds to what COL tree. If you forget to implement layout
a debugging representation of the node is printed that does not look entirely terrible.
Expr
is a bit more special: it demands an implementation of def precedence: Int
for each expression. You can use bind
, assoc
, nassoc
, lassoc
and rassoc
as conveniences to see whether sub-expressions require parentheses:
bind
adds parentheses if the precedence of the subexpression is lower than the specified precedenceassoc
adds parentheses if the precedence of the subexpression is strictly lower than our precedence (i.e. we "associate" with the subexpression)nassoc
adds parentheses if the precedence of the subexpression is less or equal than our precedence (i.e. we do not "associate" with the subexpression)lassoc
is the correct implementation for a left-associative binary operator (e.g.+
,*
)rassoc
is the correct implementation for a right-associative binary operator (e.g.::
,==>
)
Document tree
The elements of a document tree are as follows:
Empty
, which renders as nothingText(String)
, which simply represents a piece of text. Must not contain newlines.Cons(Doc, Doc)
(orDoc <> Doc
), which represents the unspaced concatenation of two trees.Nest(Doc)
, which indicates the tree should be nested (i.e. indented)Group(Doc)
, which indicates that newlines should preferrably be removed from the tree.Line
andNonWsLine
, which are both newlines. If collapsed in aGroup
, onlyLine
is replaced with one space.NodeDoc(Node[_], Doc)
, which has no effect on typesetting, but indicates this document tree corresponds to this node.
The deviations from Wadler's algorithm are:
- We have no
<|>
to indicate alternatives. Instead our very opinionatedGroup
has in effect two alternatives: we prefer for it to be printed on one line, but it may keep theLine
s andNonWsLine
s if necessary to reduce its horizontal occupancy. WhenGroup
is nested we first peel off the outerGroup
, keeping the innerGroup
if possible. - The
flatten
ofNonWsLine
isEmpty
instead ofText(" ")
NodeDoc
is there for post-processing
Otherwise the typesetting algorithm is a straightforward transliteration of the paper.
Doc
has several convenient operators to make trees:
- On the right hand side of operators, you can write anything that implements
Show
, such asNode[_]
x <> y
isCons(x, y)
x <+> y
isx <> " " <> y
x </> y
isx <> NonWsLine <> y
x <+/> y
isx <> Line <> y
x <>> y
isx <> Nest(Line <> y)
Doc.fold
applies a function on pairs ofDoc
until there is only oneDoc
. If the sequence ofDoc
s is empty, the result isEmpty
.Doc.spread
,Doc.lspread
andDoc.rspread
place spaces betweenDoc
s.r
has an extra space on the right,l
has an extra space on the left.Doc.stack
places newlines betweenDoc
s that reduce to a space in aGroup
.Doc.arg
indents aDoc
on a new line, then appends another newline. Both newlines reduce to nothing in aGroup
. Appropriate for an argument to a function or control structure likewhile
, when wrapped in aGroup
.Doc.args
combines comma-separated arguments with newlines that reduce to a space.Doc.spec
andDoc.inlineSpec
surround theDoc
with/*@ @*/
if necessary. Note: the context only indicates to the tree argument that we are now in a spec if the argument toDoc.spec
is not yet aDoc
, but something yet to be showed. This can be delayed withShow.lazily
if necessary.
Elements
After typesetting the document we are left with a list of elements, which are one of two variants:
ELine(indent: Int)
, which renders a new line with a given amount of indentation. The context contains an indentation string, that is repeatedindent
amount of timesEText(text: String)
, which is rendered as its constitutent text.
Additionally, we again remember what corresponds to which node, for post-processing:
EStart(Node[_])
indicates the start of a region that represents that node.EEnd(Node[_])
indicates the end of that region.
The metadata is used in Doc.highlight
, which can point to a node in a rendered tree.
Any node that implements Show
has convenience functions write(Appendable)
and toStringWithContext(Ctx)
to convert the Show
to text.
Context
The class vct.col.print.Ctx
is available when rendering nodes and can be updated when the parent context is relevant (such as being inside a specification). Otherwise it contains information like declaration names, the horizontal width, etc.
Comparing
Equality
All nodes already implement ==
, which corresponds to the property that the nodes are semantically equivalent. This means that the following properties of nodes are considered when checking them for equality:
- Descendants of
NodeFamily
are deemed equal by structural equality: they must be of the same type, and have the same fields recursively. - Descendants of
Declaration
are deemed equal by referential equality: two separate instances of structurally equalDeclaration
s are not equal. - The
blame
of nodes is not considered in equality: this is an unfortunate design mistake. Blames should be redesigned so that they are a structural component of the AST. - The origin of nodes is not considered in equality: this is correct, because the origin only contains metadata not relevant for the boolean verification result of the node.
If this notion of equality does not fit, you can instead use compare
.
Comparison
Each node implements compare
through code generation. Its only argument is the node to compare to, and it returns a lazy list of CompareResult
. The idea is that compare
recurses into the left and right argument simultaneously, reporting any differences along the way. The three alternatives for CompareResult
are:
StructuralDifference
: in this subtree the kinds of node are different, or they have a different structure at that level. For example: a node that contains aSeq
has a differing number of arguments.MatchingReference
: so far the tree is structurally equal, and we have recursed to a point that left and right are a reference to a declaration. The references may or may not be equal.MatchingDeclaration
: these declarations are structurally equivalent, but they may or may not be reference-equal.
From here several useful abstractions are implemented:
Compare.compare
: the same ascompare
above, but attempt to reconcile structural differences by bringing left and right into a specified normal form. This is useful to compare nodes e.g. under equivalence of associativity of**
.Compare.getIsomorphism
: attempts to match upMatchingReference
andMatchingDeclaration
that point to the same declarations. If this succeeds, we say that left and right are isomorphic by substituting left declaration for right declarations. In that case the result isRight
: a map of left-declaration to right-declaration. If not, a list of differences is returned: either structural differences or declarations that cannot be reconciled.ApplyTermRewriter
: usescompare
to implement rewriting rules, by explicitly accepting aStructuralDifference
in the case that the left term is aLocal
used in the rule pattern.
Serialization
A lightly used feature of COL is serialization and deserialization: the ability to store a COL tree as an array of bytes, and the ability to translate the bytes back into a COL tree.
Serialization on the VerCors side
Each node defines two serialization methods:
def serialize(decls: Map[Declaration[G], Long]): vct.col.ast.serialize.<Node>
def serializeFamily(decls: Map[Declaration[G], Long]): vct.col.ast.serialize.<NodeFamily>
The decls
parameter is used to store the representation of references in the AST. There is also a utility method vct.col.ast.Serialize.serialize
that computes a standard assignment of declaration to Long
for a Program
.
For deserialization objects are generated for each node in vct.col.ast.ops.deserialize
, but again the vct.col.ast.Deserialize.deserialize
method is there for conveniently deserializing Program
nodes.
Usage of serialization
Currently the only usage of serialization internal to VerCors is to cache library files and user inputs. Library files are cached by default, which saves us from having to continually parse them on each run of VerCors. User inputs are currently only cached when a development flag is enabled.
Serialization outside VerCors
(De)serialization takes an extra step between COL trees and bytes through Protobuf. For those familiar with protobuf, we break with two important standards that are common for protobuf definitions:
- We do not promise backward or forward compatiblility: we are still in a stage where we extensively experiment with node representation, so we are careful that communication of serialized programs only occurs between identical versions of the serialization format. We may want to have some form of backward compatibility in the future.
- The protobuf definition is not the source of truth. Instead we derive a protobuf definition from
Node.scala
. It may be sensible to explicitly maintain a version of the protobuf definition in tandem with switching to a notion of backwards compatible nodes.
On to the good parts: for any VerCors version you can find its protobuf definitions at out/vercors/col/helpers/global/generate.dest/vct/col/ast/col.proto
. You can use this file to generate bindings for all common programming languages. This effectively makes it so that any programming language can be used to generate an input to VerCors, e.g. by extending a compiler in whatever language that is written.
Resolution
Resolution is the process of assigning a meaning to a text name. Resolution occurs in two stages:
vct.col.resolve.ResolveTypes
: Resolves the meaning of types, and loads any library types that are mentioned but not in the input. Declaration with a type can be important in resoving other references, such as field dereferences.vct.col.resolve.ResolveReferences
: Resolves everything else.
Both resolution stages have the same structure. At each node we receive a context that decides how we resolve that particular node. Before resolving any node, we fist resolve all of its children. This is because the type and well-formdness of children is more often relevant to the parent, than the well-formdness of the parent is to the children. The context is expanded upon by a node before passing it to its children. The recipe is thus as such:
enterContext
decides how to expand the context from a noderesolve
recureses into the node's children with the updated contextresolveOne
is called for the node, which may assume the children are well-formed.
ResolveTypes
is followed by its companion rewriter LangTypesToCol
, which sets the resolved references in mutable state in stone. Dually the ResolveReferences
resolver is followed by LangSpecificToCol
.
From parse tree to COL tree
VerCors has two ways of representing a text name:
- A node contains a
Ref
that is of typeUnresolvedRef
. This is appropriate for places where the name is exactly one label (as opposed to a fully qualified name likejava.lang.String
), and will clearly resolve to a category of declaration. This is unfortunately quite often not the case. - A node remembers the text name temporarily, and contains mutable state to receive the result of resolution. An example of this is
JavaLocal
.
This is an area of VerCors that should be refactored: there should be one clear way to represent an unresolved name.
UnresolvedRef
This class contains mutable state in its field resolvedDecl
, which can be filled by calling UnresolvedRef.resolve
. Most occurences of Ref
have an entry in ResolveReferences
, but it is not clear that there is always a sensible way to implement this.
Referrable
Any other situation should just remember the text name from the input in a node, and contain mutable state that receives the result of resolving it. By convention, this is a var ref: Option[?] = None
. Other code like check
and t
may assume the ref
is filled.
Usually the resolution result points to something in the AST, in which case you should use Referrable
. It can point to the following kinds of thing:
- Exactly one
Declaration
; - A portion of a declaration, such as field #2 in the multi-declaration
int x, y, z;
; - A reference to an intrinsic part of the language that cannot be directly parsed, such as cuda's
blockDim
; - A reference to a VerCors-built-in concept, like the
.size
of a sequence; - A reference to a construct that exists implicitly, such as Java's default constructor.
It is a bit frustrating that all declarations are essentially duplicated under Referrable
, but it does give us the flexibility to make names point to whatever we want.
From COL tree to resolved tree
When using an UnresolvedRef
or adding a Option[Referrable]
variable, we should take care that an appropriate entry is added in Resolve{References,Types}.resolveOne
. We split out the resolution by language (including the specification language), which usually derives an appropriate Referrable
from the {Type,Reference}ResolutionContext
. Note again that we are allowed to e.g. query the type of child nodes, which is very relevant in resolving e.g. instance methods.
From resolved tree to Ref
The companion rewriters LangTypesToCol
and LangSpecificToCol
translate resolved references into a Ref
(that is not an UnresolvedRef
). From then on references are defined only by pointing at the declaration it resolved. A consequence of this is that past the language-specific pass pointing to anything other than exactly one declaration as above is not possible.
Note: the LangSpecificToCol
pass has accumulated far too much logic. The ideal path to implement a feature in VerCors is that you design new concepts in the core of COL that you need, and figure out a straightforward way to map to it in LangSpecificToCol
. The task of LangSpecificToCol
is not to lower language features into low-level concepts immediately, and VerCors often benefits from well-designed features in the core of COL across front-ends.
Rewriting
The main activity of VerCors is rewriting a program. We use the concept of a Rewriter
to structure the transformation of a program tree in an understandable and disciplined manner.
A rewriter recurses into one tree, and calls dispatch for each element of the tree by default. To ensure that each part of the tree is actually rewritten, we use the concept of generations to make the prior tree and the new tree have unrelated types. Declarations are a bit special: each declaration need not be succeeded by exactly one declaration of the same kind — as opposed to structural nodes. By transforming a program tree the verification failures that may occur in the program change, so we explain how to transform the errors to be about the input by providing blames.
The later sections describe concepts that are more advanced, and require careful consideration to whether they are appropriate. Nested rewriters can be used when the rewriting behaviour materially changes when recursing into a certain node. It can be confusing to read, because you might intuit that the patterns in the parent rewriter are still used, when they are not. Substitution can be helpful if your rewriting process is effectively two-step, but it is more straightforward to glue on a simple behavior to one rewriter. Nevertheless it is often actually better to consider explicitly how you might model the substitution statefully anyway: this gives a clearer picture what should happen when e.g. nodes needing subtitution are nested.
Outline
When implementing a new rewriter you should make a class that extends Rewriter
. Additionally it should have a companion object that extends RewriterBuilder
: other than giving a key
and desc
to present to the user, it describes how to construct a rewriter with apply
. The apply
method is defined implicitly if you make your rewriter a case class
.
The outline of an empty rewriter then looks like this:
package vct.col.rewrite
import vct.col.ast._
import vct.col.rewrite.{Generation, Rewriter, RewriterBuilder}
import vct.col.util.AstBuildHelpers._
case object MyRewriter extends RewriterBuilder {
override def key: String = "my"
override def desc: String = "Apply a transformation to the COL tree"
}
case class MyRewriter[Pre <: Generation]() extends Rewriter[Pre] {
}
Dispatch
A rewriter describes how a COL tree is transformed. The root of the tree is passed to one of the dispatch
methods: each dispatch
method decides what should happen next. By default we always recurse into the tree, querying dispatch
for the child nodes of our current node. You might imagine that the default implementation of dispatch
for a Plus
node is structured like this:
def dispatch(e: Expr[Pre]): Expr[Post] =
e match {
case Plus(left, right) =>
Plus(dispatch(left), dispatch(right))(e.o)
/* ... other alternatives ... */
}
Structural transformations
The idea is now that you override dispatch methods for the nodes you would like to rewrite. For example, we might want to make an optimization that simplifies away additions of 0
:
val ZERO = BigInt(0)
override def dispatch(e: Expr[Pre]): Expr[Post] =
e match {
case Plus(IntegerValue(ZERO), e) => dispatch(e)
case Plus(e, IntegerValue(ZERO)) => dispatch(e)
case other => other.rewriteDefault()
}
A new concept here is rewriteDefault
: it is the mechanism used to peel off one layer of a program tree and recurse into its children using dispatch. It is automatically generated for each node, here in PlusRewrite
. It takes an implicit value of type AbstractRewriter
, which is the rewriter used to call dispatch
again. By default it picks the implicit val rewriter
which is defined for each AbstractRewriter
. Note that Scala requires that a match statement is complete. A default case as above also makes the match complete.
A common mistake is to erroneously call rewriteDefault
too often: when recursing into nodes yourself you should almost always call dispatch
, which will decide to do with the sub-node.
Transforming Declarations
See also: Succession
Declarations are special because it is not required that a declaration is transformed into a declaration. This is also represented in the signature of the dispatch
method for Declaration[_]
:
def dispatch(decl: Declaration[Pre]): Unit
Nevertheless declarations have to be placed in the tree, and references to declarations need to be rewritten. Both are arranged through an instance of the Scopes
class — one for each kind of declaration. Rewriting references is tricky, so this is explained later on in the section on succession.
A node that contains declaration, such as Class[G](decls: Seq[ClassDeclaration[G]], ...)
, needs to come up with a new list of ClassDeclaration[G]
. We first have to consider the appropriate Scopes
instance: in this case classDeclarations
. We call classDeclaration.collect
, which pushes a collection buffer onto a stack. A lambda is passed to collect
in which we can take any actions, chiefly to declare declarations of kind ClassDeclaration
. By convention we call dispatch
on all the Pre
-state declarations that we are interested in, and (optionally) we can declare one or more other ClassDeclaration
s we need to add. All this is also allowed to happen recursively: we can be in a statement somewhere in the class and still declare a helper method to the classDeclarations
scope.
Combining this information we can obtain the new declarations of a class as such:
val newDeclarations = classDeclarations.collect {
cls.decls.foreach(dispatch)
}._1
Collect returns a tuple: a list of new declarations, and whatever the result of the lambda is (here: Unit
). On the other side of the fence, the old class declarations are rewritten and declared to the scope. Reproducing the default behaviour for ClassDeclaration
s might look like this, skipping over succession for the moment:
override def dispatch(decl: Declaration[Pre]): Unit =
decl match {
case decl: ClassDeclaration[Pre] =>
classDeclarations.declare(decl.rewriteDefault())
case other =>
allScopes.anyDeclare(other.rewriteDefault())
}
Our friend rewriteDefault
makes another appearence, which peels off the ClassDeclaration
constructor, and recurses into its components. In the case that a ClassDeclaration
contains further declarations, it will call collect
again in a similar way. For example, an InstanceMethod
is a ClassDeclaration
and contains declarations of kind Variable
for its arguments.
After rewriting the declaration fully by opting to rewrite it in a one-to-one manner, it is declared to the buffer of ClassDeclaration
s that we are currently collecting.
Lastly we must always complete the match, which is done by also rewriting other declarations one-to-one, and using allScopes
to find the correct scope to put the declaration in.
Generations
You may have noticed by now that each node carries with it a type parameter, usually denoted G
, Pre
or Post
. This type parameter is not used or stored anywhere in the tree, and due to type erasure it is in fact entirely compiled away: it has no operational effect in the tool. Instead it is a facility by which we can make COL trees have entirely disjoint types. We maintain by hand the invariant that the subnodes of each node have the same "generation" type parameter as the node itself in the node declarations. Each node declares a type parameter G
, which is passed along in all node types that the node contains.
All normal tree operations make sense as long as the generation type parameter matches between trees: we can take a tree of type Node[G]
, and replace all occurences of a Local[G]
with an Expr[G]
. On the other hand, there is no way to place a tree of type Node[L]
in a tree of type Node[R]
(unless L
and R
resolve to the same type).
This trick is useful in rewriters: we make the input tree type unrelated to the output tree type. The input node type uses Pre
for the generation, whereas the output uses Post
. This makes it so that you cannot return an input tree as though it is an output tree: you are required to rewrite the input explicitly. Although it is possible that optimizations exist where it would not be necessary to recurse into a particular type of tree, we find that eventually it becomes necessary anyway. A short parable: contract expressions are pure, so we should not have to rewrite them if we are only transforming method invocations (a side effect). Afterwards we add a feature where a method can be marked /*@ pure */
, which translates the method to a pure function. It should afterwards be possible to use this method in contract expressions, but this introduces a bug at great distance in our original rewriter.
Succession
See also: Transforming Declarations
The succession system is a pure convenience: it accounts when there is exactly one canonical successor of a declaration, then uses that successor to rewrite references to the declaration. When rewriting a declaration is left to the default dispatch implementation, there is always one canonical successor: the declaration it is rewritten into. This can be witnessed in the default implementation of dispatch
for declarations, which is defined in NonLatchingRewriter
:
override def dispatch(decl: Declaration[Pre]): Unit =
allScopes.anySucceed(decl, decl.rewriteDefault())
The method anySucceed
both declares the new declaration, and accounts it as the successor of the old declaration. In the rare case that you need to only account the successor of the declaration, but not declare it, you can call anySucceedOnly
instead.
Note it is typically not necessary to use allScopes
: it is simply a dispatcher for the scopes by declaration kind, like globalDeclarations
, classDeclarations
, etc. For example, you can declare and account the successor of a procedure as such:
globalDeclarations.succeed(oldProc, newProc)
Rewriting references
The default way of rewriting any reference is to query the successor of the declaration it refers to. Note that this is by default always done lazily: the succession accounting may be delayed until the first time the reference is queried with _.decl
.
There is no obligation to always use an instance of Scopes
to account successors. For convenience you can opt to use SuccessionMap
, which is a thin layer over a thread-safe regular map that allows the construction of Ref
s that query the map lazily, and provide some useful error otherwise. Even this is not mandatory: you can choose to use e.g. a mutable.Map
instead if you want. It is however important to remember the law of succession:
When the successor of a declaration is not accounted, you must consider all positions that can reference that kind of declaration.
Luckily for most kinds of declarations there are not that many nodes that can refer to it. It is helpful to consider that references to a broad type are extremely sparse. In particular, nearly all references at least stay within the type of its kind (e.g. ClassDeclaration[G]
, GlobalDeclaration[G]
) and the vast majority refer to a leaf type.
Scoping
The successor of a declaration is visible only in the scope that the declaration itself is visible. For example: it does not make sense to ask about the successor of a local variable, if the variable belongs to a different method. What this enables is the simple-sounding property that it is usually safe to rewrite a node twice: the successors accounted in the second rewrite do not clobber the successors accounted in the first rewrite, because they are scoped correctly. We call nodes for which it is safe to rewrite them twice duplicable.
Most structural nodes are duplicable, and declarations are themselves not duplicable: you are not allowed to overwrite the successor of a declaration within one scope. Note that structural nodes that contain declarations (e.g. Program
, Forall
) are usually duplicable. The property that determines whether a node is duplicable is as such: whenever a declaration can (transitively) occur in a node, it must have a parent Node that @scopes
that kind of declaration.
Blames
One of the core tenets of VerCors is that technical verification errors are translated as precisely as possible to a problem in the input to the tool.
We submit a large query with many verification goals to our backend, Viper, and receive back a list of errors. This list is translated backwards to relate to the input through the blame system: each node that induces a verification goal must explain what to do with the error. The most common way to explain away an error is to just blame somebody else! Each rewriter gets an input tree, along with blames for each goal-inducing node. On the output side, we have to provide the blame instead, but for simpler rewrites there usually is an apparent blame we can use from the input side.
Blaming somebody else
If the type of error on the input and output side are the same, we can simply place the blame in the output tree without rewriting it — blames are special in this sense. A slight complication occurs when the type does not match. For example, we might be tranlating a well-formedness condition like Blame[MapKeyError]
to a function precondition blame of type Blame[InvocationFailure]
.
In this case we have to indicate how the error is translated:
case class MapKeyPreconditionFailed(get: MapGet[_]) extends Blame[InvocationFailure] {
override def blame(error: InvocationFailure): Unit =
error match {
case PreconditionFailed(_, _, _) =>
get.blame.blame(MapKeyError(get))
case ContextEverywhereFailedInPre(_, _) =>
PanicBlame("This rewriter does not generate context_everywhere clauses for map get functions")
.blame(error)
}
}
If the precondition of the function fails in some position, we translate the error to be the original well-formedness conditions on a map get operation.
Blaming the system
We declare that it is impossible that ContextEverywhereFailedInPre
occurs, because we guarantee in the rewriter that the context_everywhere
clause of the function we generate is true
. It is important that we do not just silence the error.
Note: Although this style of blame transformation works, it is too easy to silence errors this way. In fact: this snippet would still compile if the PanicBlame
line was omitted entirely. We should consider making blames functional: it should return a (list of) errors instead of invoking a further blame explicitly.
Blaming everybody else
For some positions where a contract is specified, you can also depend on what part of the contract is false. You can find these positions through the fact that they contain an AccountedPredicate[G]
instead of an Expr[G]
. They additionally report a sequence of AccountedDirection
in the appropriate failure, which you can match on in the blame. In fact: you need to be careful to ensure that when the shape of an AccountedPredicate
changes, the matching blame
needs to also account for the changed shape.
Typically a contract expression is added to the start or end of an accounted predicate. For example, context_everywhere
is transated to be a regular pre- and postcondition as such:
contract.rewrite(
contextEverywhere = tt,
requires = SplitAccountedPredicate(
UnitAccountedPredicate(freshInvariants()),
dispatch(contract.requires),
),
ensures = SplitAccountedPredicate(
UnitAccountedPredicate(freshInvariants()),
dispatch(contract.ensures),
),
)
Here the context_everywhere
specifications are prepended, so we have to call the original blame when the accounted predicate path starts with FailRight
:
inv.rewrite(blame =
PreBlameSplit
.left(ContextEverywherePreconditionFailed(inv), inv.blame)
)
Note that although there is only a few places where AccountedPredicate
is used, this is really only for convenience: there is no strong reason to keep e.g. loop invariants an Expr
other than convenience.
Blaming the user
So far we have looked at blames that blame other blames, but they do come from somewhere. All the transformations of parse trees to COL trees keep around a blameProvider
, which explains how to create the initial blame. Note: although the constructed blame can optionally depend on the position, this is not usually used to determine the location at the source. Instead, the origin of the relevant node in the VerificationFailure
is typically used. This design is a bit shaky and should be reviewed.
This process of constructing an initial blame is only done in the parse tree stage: from then on new blames are not created. The typical way to provide blame is to pass a ConstantBlameProvider
that gives out a constant BlameCollector
for every blame: a facility that simply stores a list of VerificationFailure
s in a buffer.
Nested Rewriters
Occasionally it can be useful to nest rewriters. It essentially divorces the rewriting rules entirely from the parent rewriter when you invoke the nested rewriter. This means that any match arms or other logic in dispatch
implementations in the parent rewriter are not used when inside the nested rewriter.
It is rare that a nested rewriter is actually what you want: usually it is more explicit and clear to account the desired behaviour with a piece of state in one rewriter. As an example we can consider the rewriter that collects inline patterns of quantifiers: its task is to scan for any patterns that occur on the inside of a quantifier marked with {: ... :}
, and apply them as triggers for the quantifier. It may be tempting to do this with a rewriter: we ignore patterns while recursing into the tree until we encounter e.g. a Forall
, and then our behaviour changes entirely: we instead remove patterns and collect them into some buffer — why not rewrite the body of the quantifier with a nested rewriter?
The problem becomes apparent when you consider that a Forall
can also contain a nested Forall
. When scanning for patterns we encounter a nested Forall
while in the nested rewriter. Do we call to the outer rewriter again? We can apply another nesting level in the rewriter, but where do we save patterns that are in the nested Forall
? What if we add support for triggers for Exists
, do we add the pattern in both places?
Instead of implementing this behaviour with a nested rewriter it is more straightforward to maintain a stack of buffers that collect patterns, and have the match arm for Forall
and InlinePattern
on the same level. Forall
puts a collection buffer on the stack, whereas InlinePattern
adds itself to the buffer on the top of the stack. It is a bit more explicit and stateful, but in actuality it is a better admission of how your rewriter is structured: a nested rewriter just thinly veils this state.
Declaring a nested rewriter
By convention a nested rewriter is usually declared inside the parent rewriter it is used in:
case class OuterRewriter[Pre <: Generation]() extends Rewriter[Pre] {
case class NestedRewriter[Pre <: Generation]() extends Rewriter[Pre] {
// ...
}
}
The only thing that is immediately necessary to consider is that of scopes. When communicating a tree from the parent to the nested rewriter, or vice versa, it is important that the rewriters agree on how to arrange declarations, and whether succession records in the nested rewriter are visible to the parent rewriter.
The most straightforward way to arrange scopes and succession is to just let the nested rewriter inherit the allScopes
field of the parent rewriter:
case class OuterRewriter[Pre <: Generation]() extends Rewriter[Pre] {
outer =>
case class NestedRewriter[Pre <: Generation]() extends Rewriter[Pre] {
override val allScopes = outer.allScopes
}
}
This is the only reasonable option if it is required that the nested rewriter returns a tree of the Post
generation. There is more flexibility if you can accept that the nested rewriter returns a Pre
-tree. For example, the Substitute
utility has an entirely fresh allScopes
and uses it of declarations inside the tree that is being substituted, but returns references to declarations as-is when the successor is not known inside the substitution:
case class Substitute[G](/* ... */) extends NonLatchingRewriter[G, G] {
case class SuccOrIdentity() extends SuccessorsProviderTrafo[G, G](allScopes) {
override def postTransform[T <: Declaration[G]](
pre: Declaration[G],
post: Option[T],
): Option[T] = Some(post.getOrElse(pre.asInstanceOf[T]))
}
override def succProvider: SuccessorsProvider[G, G] = SuccOrIdentity()
/* ... */
}
Notice that Substitute
is thus not suitable when subtituting a tree that contains declarations that must be scoped outside the tree that is being substituted. This is explained further in Substitution.
Another option is to recognize that the declarations of the tree being rewritten are entirely disjoint from the parent tree. This means that there are no references from the parent to the subtree, or from the subtree to the parent. An example of this is the simplifier: it loads rules from a file, which of course cannot contain a reference to the tree being simplified, nor can the subject tree refer to simplification rules it has not loaded. Thus: rewriting a rule in a nested rewriter with fresh scopes is perfectly fine.
Substitution
vct.col.util.Subsitute
is a utility that allows you to replace expressions and types according to a map. As noted in Nested Rewriters it is just a rewriter, but it is a bit more convenient to use safely.
Rule 1: Substitution can only occur in the
Pre
generation.
Rule 2: Substitution can only be applied to structural nodes.
Since Substitute needs to be able to rewrite declarations to rewrite types and expressions that occur in declarations, Substitute
cannot work on Post
trees when it contains references that are not known yet. Additionally, you can not perform substitutions in a tree that contain declarations that need to be declared outside the substitution. This means that generally you can apply Substitute
to structural nodes, but not declarations. It is perfectly possible to design nodes in such a way that a structural node (transitively) contains a declaration, but does not scope it. This is pretty rare, though an example is IterVariable
.
This limitation is a reflection of the fact that substituting a declaration that needs to be accessible outside the substitution necessarily imposes an obligation on the parent rewriter to explain what the successor of the declaration is. An alternative is that Substitute
instead inherits the allScopes
of the parent rewriter, but this would make Substitute
a Pre
-to-Post
rewriter: we could no longer substitute again in the tree, and it cannot be further processed in the parent rewriter, limiting its use.
Backend and Errors
Our backend accepts a vastly reduced input language, and must call the appropriate blame when a verification goal fails:
trait Backend[P] {
def transform(program: Program[_], output: Option[Path]): P
def submit(intermediateProgram: P): Boolean
}
The Backend
trait presumes that the input program first must be translated to the internal language of the backend. The boolean returned by submit
indicates whether there were any verification failures at all This is used because verification failures can potentially be spurious. That is: it is possible that with additional help or different random heuristics the proof might succeed. It is thus not safe to cache the result of a program verification when there are failures.
Although VerCors is designed in such a way that we could connect to a new backend, our current backend is Viper and it is unlikely this will change anytime soon.
Viper
Viper is an intermediate language and verification tool that accepts simple imperative programs with annotation in first-order permission-based separation logic. The input is straightforwardly translated to the Viper language, then submitted to one of Viper's two backends:
- Silicon, which uses a symbolic execution approach, and interrogates z3. Silicon also has flags that enable it to talk to cvc5, but so far this seems of limited use.
- Carbon, which uses verification condition generation, and submits a translated program to Boogie.
On balance Silicon is almost always faster, so most support in VerCors is geared towards that backend. It is also the default.
History
Earlier versions of VerCors encoded to Chalice, Boogie and Dafny. These approaches are in the git history of VerCors, but are otherwise lost to time. Some details as to what fits onto what these days:
- The carbon verifier of Viper still encodes to Boogie
- The original version of Chalice encodes to Boogie
- A newer implementation of Chalice encodes to Viper instead, though it is unclear whether this is maintained.
- Silver is an internal name for the Viper language, and should no longer be used. Using it in code is fine, but prefer using "Viper language" in writing.
Program Translation
To submit the program to Viper, the COL AST is transformed into a Viper AST. As opposed to the transformation steps, this is not done with a rewriter, but with explicit recursion. Like in the parser transformation stage we strive to maintain a standard of "no surprises" in the ColToSilver
transformation: the COL ast should already nearly read like a Viper program before we translate it to Viper.
Info
Each node in Viper carries an info
field that can be used to store additional data unrelated to verification. We store a NodeInfo
in this field, which remembers what COL node caused the generation of this Viper node. Additionally it remembers parent nodes of the node that may be relevant for blame accounting
Position
Each Viper node also remembers a position. This is typically a file path, line number and column, but since the Viper position is normally not visible to the user of VerCors, we set it to something that is useful in debugging. It contains as text a file position, and a unique_id
that can be programatically traced back to the node it corresponds to. This currently serves two purposes:
- Identical errors about identical nodes are deduplicated, but when they occur in a different position in the tree (that is: they have distinct positions) they are not deduplicated. There is still a bug open about this here, but there is presently no real issue, given that encoding different positions for different nodes is a valid solution.
- The position mostly survives when encoding to SMTLIB, e.g. when quantifiers are named. In quantifier reports we can thus recover the COL node related to the quantifier by matching on the
unique_id
portion of its name.
Error Translation
Viper returns errors of type AbstractVerificationError
, which have to be translated to an appropriate VerificationFailure
and given to an appropriate blame.
Nearly all errors contain at least an offendingNode
and an errorReason
, where the error reason again contains an offendingNode
. To find an appropriate blame we first find the NodeInfo
associated with the offendingNode
, or the NodeInfo
of the error reason's offendingNode
. In some cases the node contained in the NodeInfo
immediately is the appropriate node with a blame, otherwise one of the fields denote the nearby node that has the appropriate blame.
The translation of errors is trickier than you might expect. This is primarily due to the fact VerCors makes a distinction between errors on fallible statements, and well-formedness problems that happen to occur in a fallible statement. For example, VerCors considers these different kinds of error:
fold p()
may fail because you do not have all the permissions inp()
: this is a problem with thefold
statement.fold p(x / y)
may fail becausey
may be zero: this is a well-formedness problem, and is orthogonal to thefold
statement.
Nevertheless Viper considers both these errors a FoldFailed
. To determine what's what we have to first inspect the type of the errorReason
:
AssertionFalse
is always a problem with the node viper is reporting on.DivisionByZero
,LabelledStatementNotReached
,SeqIndexNegative
,SeqIndexExceedsLength
andMapKeyNotContained
are always well-formedness problems, and are reported on the node related to theerrorReason
.InsufficientPermission
andMagicWandChunkNotFound
depend on context. E.g. inx.f := y.f
when the insufficient permission isx.f
, the assignment is the problem, whereas fory.f
the value expression is just not well-formed.
Finally there are two reasons that are a bit special in VerCors and Viper:
NegativePermission
and recentlyNonPositivePermission
: this reason used to be embedded in the truth value of a contract, soPerm(_, -1)
previously was simply equivalent tofalse
. This was changed, and it is now a well-formedness problem: it belongs in the same category asDivisionByZero
etc.QPAssertionNotInjective
is currently a well-formedness condition by default, but this also used to be part of the truth value of a contract. This means that similarly an expression like(\forall int i=0..2; Perm(x.f, 1\4))
would be equivalent tofalse
. A difference withNegativePermission
is that this behaviour is still recoverable with a backend flag:--assumeInjectivityOnInhale
.
NegativePermission
and NonPositivePermission
are improperly supported in VerCors, and should be looked at. VerCors does not support --assumeInjectivityOnInhale
and error translation will likely break if you try to use it.
If you've been keeping count, there are between 3 and 5 error reasons that make a contract false: AssertionFalse
, InsufficientPermission
, MagicWandChunkNotFound
, and depending on flags and Viper version NegativePermission
and QPAssertionNotInjective
. In VerCors we call these a ContractFailure
.
All this can be summarized in the following table:
Reason | Error Node | Reason Node | Contract Failure |
---|---|---|---|
DivisionByZero | ❌ | ✅ | ❌ |
LabelledStatementNotReached | ❌ | ✅ | ❌ |
SeqIndexNegative | ❌ | ✅ | ❌ |
SeqIndexExceedsLength | ❌ | ✅ | ❌ |
MapKeyNotContained | ❌ | ✅ | ❌ |
AssertionFalse | ✅ | ❌ | ✅ |
InsufficientPermission | ✅ | ✅ | ✅ |
MagicWandChunkNotFound | ✅ | ✅ | ✅ |
NegativePermission | ❌ | ✅1 | ❌1 |
NonPositivePermission | ❌ | ✅2 | ❌ |
QPAssertionNotInjective | ✅3 | ✅ | ✅3 |
1Currently still reported on the error node and treated as a contract failure
2This error reason is not matched at all, and will likely crash VerCors if encountered
3Treating this as a contract failure with
--assumeInjectivityOnInhale
is as yet unsupported
Utilities
This chapter describes a few utilities included with VerCors. A fair few utilities are implemented as a middleware:
Middleware
A middleware for VerCors is a concern that is cross-cutting, needs installing and potentially cleanup after finishing, and needs to be accessible globally for convenience. Examples of middlewares in VerCors are: logging, profiling, and progress printing. All have a start and an end with installation logic and cleanup, and it would be wildly inconvenient if we had to pass around a profiling object everywhere in the tool.
The motivation for making middleware an explicit abstraction is twofold:
LSP
Middleware will enable us to swap in different implementaitons for logging, progress, etc. when we implement VerCors as a language server. Currently middlewares are immediately implemented as a class, but we can easily fit an interface inbetween, and install different middlewares when in console mode or LSP mode.
Crashing and all that
It is helpful that certain clean-up tasks happen irrespective of whether VerCors finishes normally, the process is interrupted via Ctrl+C, or VerCors crashes. Finishing normally and crashing could harmonize middleware cleanup via a finally
block, but this is not the case for stopping VerCors with Ctrl+C (and other signals). The only way to have clean-up behaviour in that case is to register a shutdown hook. The middleware abstraction simply makes this uniform: uninstall tasks always run, and they run in reverse order of how they were installed.
Caching
The hre.cache.Cache
utility allows you to create an on-disk cache directory that is determined by a specified list of keys. There is also a facility to tie the cache directory only to the process that the cache is being requested in, essentially reducing the cache directory to a temporary directory.
Usages
There are a few usages of caching:
- In
vct.cache.Caches
a directory is made to cache verification results from carbon and silicon. This behaviour is guarded behind the experimental option--dev-cache
. An entry is stored in the cache if a program has no verification failures, in which case the program is serialized to disk. - In
vct.cache.Caches
there is a directory that caches the result of parsing, resolving and normalizing a library file writting in PVL. This is used for e.g. simplification rules and datatype definitions. The result is serialized to disk.
Profiling and Progress
Profiling and progress are middlewares that both depend on the task registry middleware.
Task Registry
The structure of VerCors execution is tracked as a tree of tasks. Each parent task can have any number of sub-tasks, where the parent task cannot end before the subtask. It is allowed and expected that a parent task can have multiple active subtasks in the case that the subtasks are executed concurrently. Tasks can be as fine-grained or course-grained as you like, but there is significant to creating a task on the VerCors level, so consider only making new tasks e.g. around process interaction.
Each task defines explicitly what its supertask is by overriding def superTask: AbstractTask
. In most cases there are utility methods in the hre.progress.Progress
object that grab the current task as the parent for the subtask that is being entered, but if it is necessary to obtain the current task, it is available at hre.progress.TaskRegistry.currentTaskInThread
. Note that the current task is stored in a ThreadLocal
, meaning that while every thread individually has its own correct notion of the current task, the task is not accounted as the current task in any further started threads. In this case the task either has to be obtained manually before starting a new thread, or you let a shorthand deal with this for you in the Progress
object. In particular it is supported to put parallel collections in Progress.map
, Progress.foreach
, etc.
Each task can optionally define a weight, which is used to increase the amount of progress done in the parent task. Note that each task stores a number between 0.0
and 1.0
to recall how far along we are, so the sum of progressWeight
of subtasks must not exceed 1.0
. You can define the weight of a task by overriding def progressWeight: Option[Double]
.
Lastly each task must define profilingBreadcrumb
and renderHere
, which are used respectively in profiling and in progress rendering.
Profiling
VerCors has an internal capability to profile itself. Typically profiles trace the call stack, but we trace the tasks in the task registry. This is very helpful, because it enables us to do things like print the precise verification goal as a "stack" entry. The output format of a vercors profile is pprof, see the wiki for usage information.
On supported platforms we use the getrusage
syscall, which is supported in at least Linux kernels and macOS. MacOS does not support a feature where usage is also tracked for child processes for which we have called wait
, so precise usage for z3
is not tracked on MacOS unfortunately. For windows there is no support other than just tracking wall time, though conceivably we could look into using GetProcessTimes
.
The labels used in the stack of the profile are simply the trail of profilingBreadcrumb
values towards the task that is being polled. Note that although the method that updates the profile is called poll
, it is actually only invoked exactly at the start and the end of a task: there is no timer that polls tasks on an interval.
Progress
Although VerCors does print a progress bar, what we really mean by progress is the principle that we should have some idea what the tool is doing at all times. Ideally we are headed for a future where verifying a file is extermely performant, and knowing what the tool is doing at a given moment becomes irrelevant, but in the meantime we have to deal with the reality that verifying programs can be minutes- or hours-slow for nearly inscrutable reasons.
To make things a bit more bearable, each task in the task registry offers a ProgressRender
when renderHere
is called. This is simply a list of text lines, together with a line index: the anchor to which we should point. In particular tasks that relate to a point in the input print a message next to a sliver of the source, where the message is then the anchor line. A good approach to diagnose a slowness in VerCors is to make tasks more fine-grained, which helps users and developers in the future. A recent example of this is that it seemed that verification was slow to "start up," so now VerCors reports what it is working on just before starting verification (translation, diagnostic output, consistency checking or idempotency checking). You do not need to immediately understand what that means, but you can imagine it is helpful to receive a bug report "tool stuck on idempotency checking" rather that "tool stuck in verification."
A standing problem is that the performance of task accounting in the backend is bad: in unlucky cases we can spend up to 50% of time just shuffling around tasks about the verification, rather than working on the verification. Nevertheless it has proven very useful to get an idea about what tasks take long, whether there is a large number of tasks or a large number of branches, what path conditions are particularly costly for a verification goal, etc. Note also that a large accounting overhead is indicative of a problem in and of itself: it implies that the number of different verification goals is very large, so the time-saving opportunity is most likely in the direction of reducing the amount of verification goals (or branches under which they are checked) anyway.
Importing
Importing a library definition from a file is a fairly straightforward affair. Similar to regular user input, a file is parsed, resolved and somewhat rewritten, after which it is placed in the subject tree.
The vct.importer.Util
object defines loadPVLLibraryFile
, which parses, resolves and disambiguates an input file. It also caches the result of this chain for later use. It is currently used for datatype definitions and simplification rules.
Locating a declaration for reference in a library file is done by filtering on the SourceName
of declarations. As opposed to its preferred name, the SourceName
is the formal name exactly as it appeared in the input. As a matter of style, this is only used when immediately placing the library in the AST, after which we no longer consider the SourceName
reliable. For example, it is perfectly valid that user code contains an overlapping SourceName
.