Theory Outer_Syntax

(*:maxLineLen=78:*)

theory Outer_Syntax
  imports Main Base
begin

chapter ‹Outer syntax --- the theory language \label{ch:outer-syntax}›

text ‹
  The rather generic framework of Isabelle/Isar syntax emerges from three main
  syntactic categories: ‹commands› of the top-level Isar engine (covering
  theory and proof elements), ‹methods› for general goal refinements
  (analogous to traditional ``tactics''), and ‹attributes› for operations on
  facts (within a certain context). Subsequently we give a reference of basic
  syntactic entities underlying Isabelle/Isar syntax in a bottom-up manner.
  Concrete theory and proof language elements will be introduced later on.

  
  In order to get started with writing well-formed Isabelle/Isar documents,
  the most important aspect to be noted is the difference of ‹inner› versus
  ‹outer› syntax. Inner syntax is that of Isabelle types and terms of the
  logic, while outer syntax is that of Isabelle/Isar theory sources
  (specifications and proofs). As a general rule, inner syntax entities may
  occur only as ‹atomic entities› within outer syntax. For example, the
  string ‹"x + y"› and identifier ‹z› are legal term specifications within a
  theory, while ‹x + y› without quotes is not.

  Printed theory documents usually omit quotes to gain readability (this is a
  matter of {\LaTeX} macro setup, say via ‹\isabellestyle›, see also cite"isabelle-system"). Experienced users of Isabelle/Isar may easily
  reconstruct the lost technical information, while mere readers need not care
  about quotes at all.
›


section ‹Commands›

text ‹
  \begin{matharray}{rcl}
    @{command_def "print_commands"}* & : & any →› \\
    @{command_def "help"}* & : & any →› \\
  \end{matharray}

  rail@@{command help} (@{syntax name} * )
  ›

   @{command "print_commands"} prints all outer syntax keywords
  and commands.

   @{command "help"}~pats› retrieves outer syntax
  commands according to the specified name patterns.
›


subsubsection ‹Examples›

text ‹
  Some common diagnostic commands are retrieved like this (according to usual
  naming conventions):
›

help "print"
help "find"


section ‹Lexical matters \label{sec:outer-lex}›

text ‹
  The outer lexical syntax consists of three main categories of syntax tokens:

     ‹major keywords› --- the command names that are available
    in the present logic session;

     ‹minor keywords› --- additional literal tokens required
    by the syntax of commands;

     ‹named tokens› --- various categories of identifiers etc.

  Major keywords and minor keywords are guaranteed to be disjoint. This helps
  user-interfaces to determine the overall structure of a theory text, without
  knowing the full details of command syntax. Internally, there is some
  additional information about the kind of major keywords, which approximates
  the command type (theory command, proof command etc.).

  Keywords override named tokens. For example, the presence of a command
  called ‹term› inhibits the identifier ‹term›, but the string ‹"term"› can
  be used instead. By convention, the outer syntax always allows quoted
  strings in addition to identifiers, wherever a named entity is expected.

  When tokenizing a given input sequence, the lexer repeatedly takes the
  longest prefix of the input that forms a valid token. Spaces, tabs, newlines
  and formfeeds between tokens serve as explicit separators.

  
  The categories for named tokens are defined once and for all as follows.

  \begin{center}
  \begin{supertabular}{rcl}
    @{syntax_def short_ident} & = & letter (subscript? quasiletter)* \\
    @{syntax_def long_ident} & = & short_ident(›‹.›short_ident)+ \\
    @{syntax_def sym_ident} & = & sym+  |›~~‹\›‹<›short_ident›‹>› \\
    @{syntax_def nat} & = & digit+ \\
    @{syntax_def float} & = & @{syntax_ref nat}‹.›@{syntax_ref nat}~~|›~~‹-›@{syntax_ref nat}‹.›@{syntax_ref nat} \\
    @{syntax_def term_var} & = & ‹?›short_ident  |›~~‹?›short_ident›‹.›nat› \\
    @{syntax_def type_ident} & = & ‹'›short_ident› \\
    @{syntax_def type_var} & = & ‹?›type_ident  |›~~‹?›type_ident›‹.›nat› \\
    @{syntax_def string} & = & ‹"› …› ‹"› \\
    @{syntax_def altstring} & = & ‹`› …› ‹`› \\
    @{syntax_def cartouche} & = & @{verbatim "‹"} …› @{verbatim "›"} \\
    @{syntax_def verbatim} & = & ‹{*› …› ‹*}› \\[1ex]

    letter› & = & latin  |›~~‹\›‹<›latin›‹>›~~|›~~‹\›‹<›latin latin›‹>›~~|  greek  |› \\
    subscript› & = & ‹⇩› \\
    quasiletter› & = & letter  |  digit  |›~~‹_›~~|›~~‹'› \\
    latin› & = & ‹a›~~| … |›~~‹z›~~|›~~‹A›~~|  … |›~~‹Z› \\
    digit› & = & ‹0›~~|  … |›~~‹9› \\
    sym› & = & ‹!›~~|›~~‹#›~~|›~~‹$›~~|›~~‹%›~~|›~~‹&›~~|›~~‹*›~~|›~~‹+›~~|›~~‹-›~~|›~~‹/›~~|› \\
    & & ‹<›~~|›~~‹=›~~|›~~‹>›~~|›~~‹?›~~|›~~‹@›~~|›~~‹^›~~|›~~‹_›~~|›~~‹|›~~|›~~‹~› \\
    greek› & = & ‹α›~~|›~~‹β›~~|›~~‹γ›~~|›~~‹δ›~~|› \\
          &   & ‹ε›~~|›~~‹ζ›~~|›~~‹η›~~|›~~‹θ›~~|› \\
          &   & ‹ι›~~|›~~‹κ›~~|›~~‹μ›~~|›~~‹ν›~~|› \\
          &   & ‹ξ›~~|›~~‹π›~~|›~~‹ρ›~~|›~~‹σ›~~|›~~‹τ›~~|› \\
          &   & ‹υ›~~|›~~‹φ›~~|›~~‹χ›~~|›~~‹ψ›~~|› \\
          &   & ‹ω›~~|›~~‹Γ›~~|›~~‹Δ›~~|›~~‹Θ›~~|› \\
          &   & ‹Λ›~~|›~~‹Ξ›~~|›~~‹Π›~~|›~~‹Σ›~~|› \\
          &   & ‹Υ›~~|›~~‹Φ›~~|›~~‹Ψ›~~|›~~‹Ω› \\
  \end{supertabular}
  \end{center}

  A @{syntax_ref term_var} or @{syntax_ref type_var} describes an unknown,
  which is internally a pair of base name and index (ML type ML_typeindexname). These components are either separated by a dot as in ?x.1› or
  ?x7.3› or run together as in ?x1›. The latter form is possible if the base
  name does not end with digits. If the index is 0, it may be dropped
  altogether: ?x› and ?x0› and ?x.0› all refer to the same unknown, with
  basename x› and index 0.

  The syntax of @{syntax_ref string} admits any characters, including
  newlines; ``‹"›'' (double-quote) and ``‹\›'' (backslash) need to be
  escaped by a backslash; arbitrary character codes may be specified as
  ``‹\›ddd›'', with three decimal digits. Alternative strings according to
  @{syntax_ref altstring} are analogous, using single back-quotes instead.

  The body of @{syntax_ref verbatim} may consist of any text not containing
  ``‹*}›''; this allows to include quotes without further escapes, but there
  is no way to escape ``‹*}›''. Cartouches do not have this limitation.

  A @{syntax_ref cartouche} consists of arbitrary text, with properly balanced
  blocks of ``@{verbatim "‹"}~…›~@{verbatim "›"}''. Note that the rendering
  of cartouche delimiters is usually like this: ``‹ … ››''.

  Source comments take the form ‹(*›~…›~‹*)› and may be nested: the text is
  removed after lexical analysis of the input and thus not suitable for
  documentation. The Isar syntax also provides proper ‹document comments›
  that are considered as part of the text (see \secref{sec:comments}).

  Common mathematical symbols such as ∀› are represented in Isabelle as ‹∀›.
  There are infinitely many Isabelle symbols like this, although proper
  presentation is left to front-end tools such as {\LaTeX} or Isabelle/jEdit.
  A list of predefined Isabelle symbols that work well with these tools is
  given in \appref{app:symbols}. Note that ‹λ› does not belong to the
  letter› category, since it is already used differently in the Pure term
  language.
›


section ‹Common syntax entities›

text ‹
  We now introduce several basic syntactic entities, such as names, terms, and
  theorem specifications, which are factored out of the actual Isar language
  elements to be described later.
›


subsection ‹Names›

text ‹
  Entity @{syntax name} usually refers to any name of types, constants,
  theorems etc.\ Quoted strings provide an escape for non-identifier names or
  those ruled out by outer syntax keywords (e.g.\ quoted ‹"let"›).

  rail@{syntax_def name}: @{syntax short_ident} | @{syntax long_ident} |
      @{syntax sym_ident} | @{syntax nat} | @{syntax string}
    ;
    @{syntax_def par_name}: '(' @{syntax name} ')'

  A @{syntax_def system_name} is like @{syntax name}, but it excludes
  white-space characters and needs to conform to file-name notation. Name
  components that are special on Windows (e.g.\ ‹CON›, ‹PRN›, ‹AUX›) are
  excluded on all platforms.
›


subsection ‹Numbers›

text ‹
  The outer lexical syntax (\secref{sec:outer-lex}) admits natural numbers and
  floating point numbers. These are combined as @{syntax int} and @{syntax
  real} as follows.

  rail@{syntax_def int}: @{syntax nat} | '-' @{syntax nat}
    ;
    @{syntax_def real}: @{syntax float} | @{syntax int}

  Note that there is an overlap with the category @{syntax name}, which also
  includes @{syntax nat}.
›


subsection ‹Embedded content›

text ‹
  Entity @{syntax embedded} refers to content of other languages: cartouches
  allow arbitrary nesting of sub-languages that respect the recursive
  balancing of cartouche delimiters. Quoted strings are possible as well, but
  require escaped quotes when nested. As a shortcut, tokens that appear as
  plain identifiers in the outer language may be used as inner language
  content without delimiters.

  rail@{syntax_def embedded}: @{syntax cartouche} | @{syntax string} |
      @{syntax short_ident} | @{syntax long_ident} | @{syntax sym_ident} |
      @{syntax term_var} | @{syntax type_ident} | @{syntax type_var} | @{syntax nat}


subsection ‹Document text›

text ‹
  A chunk of document @{syntax text} is usually given as @{syntax cartouche}
  ‹…››. For convenience, any of the smaller text unit that conforms to
  @{syntax name} is admitted as well.

  rail@{syntax_def text}: @{syntax embedded}

  Typical uses are document markup commands, like chapter, section etc.
  (\secref{sec:markup}).
›


subsection ‹Document comments \label{sec:comments}›

text ‹
  Formal comments are an integral part of the document, but are logically void
  and removed from the resulting theory or term content. The output of
  document preparation (\chref{ch:document-prep}) supports various styles,
  according to the following kinds of comments.

     Marginal comment of the form ‹―›~‹text›› or ―›~‹text››, usually with
    a single space between the comment symbol and the argument cartouche. The
    given argument is typeset as regular text, with formal antiquotations
    (\secref{sec:antiq}).

     Canceled text of the form ‹⌦›‹text›› (no white space between the
    control symbol and the argument cartouche). The argument is typeset as
    formal Isabelle source and overlaid with a ``strike-through'' pattern,
    e.g. ⌦‹bad›.

     Raw {\LaTeX} source of the form latex‹argument›› (no white space
    between the control symbol and the argument cartouche). This allows to
    augment the generated {\TeX} source arbitrarily, without any sanity
    checks!

  These formal comments work uniformly in outer syntax, inner syntax (term
  language), Isabelle/ML, and some other embedded languages of Isabelle.
›


subsection ‹Type classes, sorts and arities›

text ‹
  Classes are specified by plain names. Sorts have a very simple inner syntax,
  which is either a single class name c› or a list {c1, …, cn}› referring
  to the intersection of these classes. The syntax of type arities is given
  directly at the outer level.

  rail@{syntax_def classdecl}: @{syntax name} (('<' | '⊆') (@{syntax name} + ','))?
    ;
    @{syntax_def sort}: @{syntax embedded}
    ;
    @{syntax_def arity}: ('(' (@{syntax sort} + ',') ')')? @{syntax sort}


subsection ‹Types and terms \label{sec:types-terms}›

text ‹
  The actual inner Isabelle syntax, that of types and terms of the logic, is
  far too sophisticated in order to be modelled explicitly at the outer theory
  level. Basically, any such entity has to be quoted to turn it into a single
  token (the parsing and type-checking is performed internally later). For
  convenience, a slightly more liberal convention is adopted: quotes may be
  omitted for any type or term that is already atomic at the outer level. For
  example, one may just write ‹x› instead of quoted ‹"x"›. Note that
  symbolic identifiers (e.g.\ ‹++› or ∀› are available as well, provided
  these have not been superseded by commands or other keywords already (such
  as ‹=› or ‹+›).

  rail@{syntax_def type}: @{syntax embedded}
    ;
    @{syntax_def term}: @{syntax embedded}
    ;
    @{syntax_def prop}: @{syntax embedded}

  Positional instantiations are specified as a sequence of terms, or the
  placeholder ``_›'' (underscore), which means to skip a position.

  rail@{syntax_def inst}: '_' | @{syntax term}
    ;
    @{syntax_def insts}: (@{syntax inst} *)
  ›

  Named instantiations are specified as pairs of assignments v = t›, which
  refer to schematic variables in some theorem that is instantiated. Both type
  and terms instantiations are admitted, and distinguished by the usual syntax
  of variable names.

  rail@{syntax_def named_inst}: variable '=' (type | term)
    ;
    @{syntax_def named_insts}: (named_inst @'and' +)
    ;
    variable: @{syntax name} | @{syntax term_var} | @{syntax type_ident} | @{syntax type_var}

  Type declarations and definitions usually refer to @{syntax typespec} on the
  left-hand side. This models basic type constructor application at the outer
  syntax level. Note that only plain postfix notation is available here, but
  no infixes.

  rail@{syntax_def typeargs}:
      (() | @{syntax type_ident} | '(' ( @{syntax type_ident} + ',' ) ')')
    ;
    @{syntax_def typeargs_sorts}:
      (() | (@{syntax type_ident} ('::' @{syntax sort})?) |
        '(' ( (@{syntax type_ident} ('::' @{syntax sort})?) + ',' ) ')')
    ;
    @{syntax_def typespec}: @{syntax typeargs} @{syntax name}
    ;
    @{syntax_def typespec_sorts}: @{syntax typeargs_sorts} @{syntax name}


subsection ‹Term patterns and declarations \label{sec:term-decls}›

text ‹
  Wherever explicit propositions (or term fragments) occur in a proof text,
  casual binding of schematic term variables may be given specified via
  patterns of the form ``(is p1 … pn)''. This works both for @{syntax
  term} and @{syntax prop}.

  rail@{syntax_def term_pat}: '(' (@'is' @{syntax term} +) ')'
    ;
    @{syntax_def prop_pat}: '(' (@'is' @{syntax prop} +) ')'

  
  Declarations of local variables x :: τ› and logical propositions a : φ›
  represent different views on the same principle of introducing a local
  scope. In practice, one may usually omit the typing of @{syntax vars} (due
  to type-inference), and the naming of propositions (due to implicit
  references of current facts). In any case, Isar proof elements usually admit
  to introduce multiple such items simultaneously.

  rail@{syntax_def vars}:
      (((@{syntax name} +) ('::' @{syntax type})? |
        @{syntax name} ('::' @{syntax type})? @{syntax mixfix}) + @'and')
    ;
    @{syntax_def props}: @{syntax thmdecl}? (@{syntax prop} @{syntax prop_pat}? +)
    ;
    @{syntax_def props'}: (@{syntax prop} @{syntax prop_pat}? +)
  ›

  The treatment of multiple declarations corresponds to the complementary
  focus of @{syntax vars} versus @{syntax props}. In ``x1 … xn :: τ›'' the
  typing refers to all variables, while in a: φ1 … φn the naming refers to
  all propositions collectively. Isar language elements that refer to @{syntax
  vars} or @{syntax props} typically admit separate typings or namings via
  another level of iteration, with explicit @{keyword_ref "and"} separators;
  e.g.\ see @{command "fix"} and @{command "assume"} in
  \secref{sec:proof-context}.
›


subsection ‹Attributes and theorems \label{sec:syn-att}›

text ‹
  Attributes have their own ``semi-inner'' syntax, in the sense that input
  conforming to @{syntax args} below is parsed by the attribute a second time.
  The attribute argument specifications may be any sequence of atomic entities
  (identifiers, strings etc.), or properly bracketed argument lists. Below
  @{syntax atom} refers to any atomic entity, including any @{syntax keyword}
  conforming to @{syntax sym_ident}.

  rail@{syntax_def atom}: @{syntax name} | @{syntax type_ident} |
      @{syntax type_var} | @{syntax term_var} | @{syntax nat} | @{syntax float} |
      @{syntax keyword} | @{syntax cartouche}
    ;
    arg: @{syntax atom} | '(' @{syntax args} ')' | '[' @{syntax args} ']'
    ;
    @{syntax_def args}: arg *
    ;
    @{syntax_def attributes}: '[' (@{syntax name} @{syntax args} * ',') ']'

  Theorem specifications come in several flavors: @{syntax axmdecl} and
  @{syntax thmdecl} usually refer to axioms, assumptions or results of goal
  statements, while @{syntax thmdef} collects lists of existing theorems.
  Existing theorems are given by @{syntax thm} and @{syntax thms}, the
  former requires an actual singleton result.

  There are three forms of theorem references:

     named facts a›,

     selections from named facts a(i)› or a(j - k)›,

     literal fact propositions using token syntax @{syntax_ref altstring}
    ‹`›φ›‹`› or @{syntax_ref cartouche}
    ‹φ›› (see also method @{method_ref fact}).

  Any kind of theorem specification may include lists of attributes both on
  the left and right hand sides; attributes are applied to any immediately
  preceding fact. If names are omitted, the theorems are not stored within the
  theorem database of the theory or proof context, but any given attributes
  are applied nonetheless.

  An extra pair of brackets around attributes (like ``[[simproc a]]›'')
  abbreviates a theorem reference involving an internal dummy fact, which will
  be ignored later on. So only the effect of the attribute on the background
  context will persist. This form of in-place declarations is particularly
  useful with commands like @{command "declare"} and @{command "using"}.

  rail@{syntax_def axmdecl}: @{syntax name} @{syntax attributes}? ':'
    ;
    @{syntax_def thmbind}:
      @{syntax name} @{syntax attributes} | @{syntax name} | @{syntax attributes}
    ;
    @{syntax_def thmdecl}: thmbind ':'
    ;
    @{syntax_def thmdef}: thmbind '='
    ;
    @{syntax_def thm}:
      (@{syntax name} selection? | @{syntax altstring} | @{syntax cartouche})
        @{syntax attributes}? |
      '[' @{syntax attributes} ']'
    ;
    @{syntax_def thms}: @{syntax thm} +
    ;
    selection: '(' ((@{syntax nat} | @{syntax nat} '-' @{syntax nat}?) + ',') ')'


subsection ‹Structured specifications›

text ‹
  Structured specifications use propositions with explicit notation for the
  ``eigen-context'' to describe rule structure: ⋀x. A x ⟹ … ⟹ B x› is
  expressed as ‹B x if A x andfor x›. It is also possible to use dummy
  terms ``_›'' (underscore) to refer to locally fixed variables anonymously.

  Multiple specifications are delimited by ``|›'' to emphasize separate
  cases: each with its own scope of inferred types for free variables.


  rail@{syntax_def for_fixes}: (@'for' @{syntax vars})?
    ;
    @{syntax_def multi_specs}: (@{syntax structured_spec} + '|')
    ;
    @{syntax_def structured_spec}:
      @{syntax thmdecl}? @{syntax prop} @{syntax spec_prems} @{syntax for_fixes}
    ;
    @{syntax_def spec_prems}: (@'if' ((@{syntax prop}+) + @'and'))?
    ;
    @{syntax_def specification}: @{syntax vars} @'where' @{syntax multi_specs}


section ‹Diagnostic commands›

text ‹
  \begin{matharray}{rcl}
    @{command_def "print_theory"}* & : & context →› \\
    @{command_def "print_definitions"}* & : & context →› \\
    @{command_def "print_methods"}* & : & context →› \\
    @{command_def "print_attributes"}* & : & context →› \\
    @{command_def "print_theorems"}* & : & context →› \\
    @{command_def "find_theorems"}* & : & context →› \\
    @{command_def "find_consts"}* & : & context →› \\
    @{command_def "thm_deps"}* & : & context →› \\
    @{command_def "unused_thms"}* & : & context →› \\
    @{command_def "print_facts"}* & : & context →› \\
    @{command_def "print_term_bindings"}* & : & context →› \\
  \end{matharray}

  rail‹
    (@@{command print_theory} |
      @@{command print_definitions} |
      @@{command print_methods} |
      @@{command print_attributes} |
      @@{command print_theorems} |
      @@{command print_facts}) ('!'?)
    ;
    @@{command find_theorems} ('(' @{syntax nat}? 'with_dups'? ')')?  (thm_criterion*)
    ;
    thm_criterion: ('-'?) ('name' ':' @{syntax name} | 'intro' | 'elim' | 'dest' |
      'solves' | 'simp' ':' @{syntax term} | @{syntax term})
    ;
    @@{command find_consts} (const_criterion*)
    ;
    const_criterion: ('-'?)
      ('name' ':' @{syntax name} | 'strict' ':' @{syntax type} | @{syntax type})
    ;
    @@{command thm_deps} @{syntax thmrefs}
    ;
    @@{command unused_thms} ((@{syntax name} +) '-' (@{syntax name} * ))?

  These commands print certain parts of the theory and proof context. Note
  that there are some further ones available, such as for the set of rules
  declared for simplifications.

   @{command "print_theory"} prints the main logical content of the
  background theory; the ``!›'' option indicates extra verbosity.

   @{command "print_definitions"} prints dependencies of definitional
  specifications within the background theory, which may be constants
  (\secref{sec:term-definitions}, \secref{sec:overloading}) or types
  (\secref{sec:types-pure}, \secref{sec:hol-typedef}); the ``!›'' option
  indicates extra verbosity.

   @{command "print_methods"} prints all proof methods available in the
  current theory context; the ``!›'' option indicates extra verbosity.

   @{command "print_attributes"} prints all attributes available in the
  current theory context; the ``!›'' option indicates extra verbosity.

   @{command "print_theorems"} prints theorems of the background theory
  resulting from the last command; the ``!›'' option indicates extra
  verbosity.

   @{command "print_facts"} prints all local facts of the current context,
  both named and unnamed ones; the ``!›'' option indicates extra verbosity.

   @{command "print_term_bindings"} prints all term bindings that are present
  in the context.

   @{command "find_theorems"}~criteria› retrieves facts from the theory or
  proof context matching all of given search criteria. The criterion name: p›
  selects all theorems whose fully qualified name matches pattern p›, which
  may contain ``*›'' wildcards. The criteria intro›, elim›, and dest›
  select theorems that match the current goal as introduction, elimination or
  destruction rules, respectively. The criterion solves› returns all rules
  that would directly solve the current goal. The criterion simp: t› selects
  all rewrite rules whose left-hand side matches the given term. The criterion
  term t› selects all theorems that contain the pattern t› -- as usual,
  patterns may contain occurrences of the dummy ``_›'', schematic variables,
  and type constraints.

  Criteria can be preceded by ``-›'' to select theorems that do ‹not› match.
  Note that giving the empty list of criteria yields ‹all› currently known
  facts. An optional limit for the number of printed facts may be given; the
  default is 40. By default, duplicates are removed from the search result.
  Use with_dups› to display duplicates.

   @{command "find_consts"}~criteria› prints all constants whose type meets
  all of the given criteria. The criterion strict: ty› is met by any type
  that matches the type pattern ty›. Patterns may contain both the dummy type
  ``_›'' and sort constraints. The criterion ty› is similar, but it also
  matches against subtypes. The criterion name: p› and the prefix ``-›''
  function as described for @{command "find_theorems"}.

   @{command "thm_deps"}~thms› prints immediate theorem dependencies, i.e.\
  the union of all theorems that are used directly to prove the argument
  facts, without going deeper into the dependency graph.

   @{command "unused_thms"}~A1 … Am - B1 … Bn displays all theorems
  that are proved in theories B1 … Bn or their parents but not in A1 …
  Am or their parents and that are never used. If n› is 0›, the end of the
  range of theories defaults to the current theory. If no range is specified,
  only the unused theorems in the current theory are displayed.
›

end