It looks like Source/LexerParser/cmFortranParser.y line 113 doe not handle a type before the function name or 'pure', 'impure' or 'elemental' before function or subroutine:
| MODULE WORD other EOSTMT { cmFortranParser* parser = cmFortran_yyget_extra(yyscanner); if (cmsysString_strcasecmp($2, "function") != 0 && cmsysString_strcasecmp($2, "procedure") != 0 && cmsysString_strcasecmp($2, "subroutine") != 0) { cmFortranParser_RuleModule(parser, $2); }
I think a more complete specification can be obtained from the draft
standards
(eg "Fortran 2018 Draft International Standard for Ballot (Cohen)" on
https://wg5-fortran.org/documents.html):
prefix-spec is declaration-type-spec
or ELEMENTAL
or IMPURE
or MODULE
or NON_RECURSIVE
or PURE
or RECURSIVE
A prefix shall contain at most one of each prefix-spec.
A prefix shall not specify both PURE and IMPURE.
A prefix shall not specify both NON_RECURSIVE and RECURSIVE.
You can have any standard type in the declaration-type-spec:
integer, real, complex, logical, character and the legacy 'double
precision',
these can be parameterized, apart from 'double precision' eg integer(2),
real(kind=8), character(kind=1,len=10), character(10) and you can have
the legacy character10 or character(10) forms,
Thanks. I had looked at the Fortran spec when introducing support for submodules and found that the grammar is quite complex. I'd prefer not to have to introduce a full Fortran parser for this.
Not being an expert in Fortran I was hoping the hack you quoted would be sufficient. Unfortunately it looks like a closer approximation will be needed.
IIUC use of the module keyword as part of a function prefix can only appear inside an INTERFACE block (which we already filter out) or after a CONTAINS. One could look at teaching the parser to recognize the latter to filter out later appearances of module. That would avoid needing to parse the entire function grammar.
IIUC use of the module keyword as part of a function prefix can only appear inside an INTERFACE block (which we already filter out) or after a CONTAINS. One could look at teaching the parser to recognize the latter to filter out later appearances of module. That would avoid needing to parse the entire function grammar.
I had a look at the standard, and agree with you that this should be true.
A work around for users is to ensure that they don't start the prefix-spec with module, start with a different prefix-spec (potentially hard to do if you're using return() in a function or using a subroutine).
I'll look at the parser and see if I can make sense of it... if so I may open an MR.
Here is the relevant text from the 2008 standard:
12.6 Procedure definition
12.6.1 Intrinsic procedure definition
Intrinsic procedures are defined as an inherent part of the processor. A standard-conforming processor shall include the intrinsic procedures described in Clause 13, but may include others. However, a standard-conforming program shall not make use of intrinsic procedures other than those described in Clause 13.
12.6.2 Procedures defined by subprograms
12.6.2.1 General
A subprogram defines one or more procedures. A procedure is defined by the initial SUBROUTINE or FUNC- TION statement, and each ENTRY statement defines an additional procedure (12.6.2.6).
A subprogram is specified to be elemental (12.8), pure (12.7), recursive, or a separate module subprogram (12.6.2.5) by a prefix-spec in its initial SUBROUTINE or FUNCTION statement.
R1225 prefix is prefix-spec [ prefix-spec ] ...
R1226 prefix-spec is declaration-type-spec or ELEMENTAL
or IMPURE
or MODULE
or PURE
or RECURSIVE
C1242 (R1225) A prefix shall contain at most one of each prefix-spec.
C1243 (R1225) A prefix shall not specify both PURE and IMPURE.
C1244 (R1225) A prefix shall not specify both ELEMENTAL and RECURSIVE.
C1245 An elemental procedure shall not have the BIND attribute.
C1246 (R1225) MODULE shall appear only in the function-stmt or subroutine-stmt of a module subprogram or of a nonabstract interface body that is declared in the scoping unit of a module or submodule.
C1247 (R1225) If MODULE appears in the prefix of a module subprogram, an accessible separate interface body (12.6.2.5) shall appear in the specification part of the module or submodule in which the subprogram appears, or shall appear in an ancestor of that program unit.
C1248 (R1225) If MODULE appears in the prefix of a module subprogram, it shall have been declared to be a separate module procedure in the containing program unit or an ancestor of that program unit.
C1249 (R1225) If MODULE appears in the prefix of a module subprogram, the subprogram shall specify the same characteristics and dummy argument names as its corresponding separate interface body.
C1250 (R1225) If MODULE appears in the prefix of a module subprogram and a binding label is specified, it shall be the same as the binding label specified in the corresponding separate interface body.
C1251 (R1225) If MODULE appears in the prefix of a module subprogram, RECURSIVE shall appear if and only if RECURSIVE appears in the prefix in the corresponding separate interface body.
The RECURSIVE prefix-spec shall appear if any procedure defined by the subprogram directly or indirectly invokes itself or any other procedure defined by the subprogram.
If the prefix-spec PURE appears, or the prefix-spec ELEMENTAL appears and IMPURE does not appear, the subprogram is a pure subprogram and shall meet the additional constraints of 12.7.
If the prefix-spec ELEMENTAL appears, the subprogram is an elemental subprogram and shall meet the additional constraints of 12.8.1.
Thanks for reporting this example. Is there an exhaustive set of types that can go there?
I think adding checks for
pure
impure
elemental
recursive
<default types>
type(...)
to Source/LexerParser/cmFortranParser.y would be a step in the right direction. I'm not up to date enough on my understanding of whether or not module names like pure or real are allowed; if so then this proposed fix would break code that names modules with one of the prefix-specs added to the parser.
But without a full grammar/AST I'm not sure there is an easy fix to all of this. I definitely think adding pure, impure, elemental and recursive are worth doing, however.
Checking for MODULE (WORD)* (function|subroutine|procedure) (stuff)* EOSTMT would be ideal, so long as the procedure/function/subroutine key words are not in a comment or after the end of a statement (i.e., multiple statements on one line). I'm not familiar with bison/yacc/flex and specifying grammars and parsers, so in my code above * is used in the regular expression sense, meaning 0 or more instances, parentheses are for grouping and | is a mandatory element with a choice between one or more values.
@zbeekman if we recognize CONTAINS then we shouldn't have to parse the module ... function ... grammar at all. It's the same reason we track "is in INTERFACE" in the parser.
I completely agree, but I didn't immediately see how to implement that. One must also be careful if multiple (sub)modules or programs are defined in the same file, and with the caveat that contains can appear in procedures too... But that is probably the fastest, and best route if it can be done accurately. I'll look at the interface stuff again to see if I can't figure out how to open a PR, but I wouldn't hold your breath; super slammed w/ work and never used bison/flex before.
I guess the point of my comment above was that there is lower hanging fruit to implement a partial/hackish fix, although this may still be beyond my capability. Any way, thanks for the response @brad.king and thanks for having another look.
My concern with trying to parse the module function grammar is that it is huge and any partial implementation will inevitably run into cases it doesn't handle and then the solution will be "just add this one little bit more" repeated over and over. It will end up being more complex than the rest of the parser we already have. It shouldn't be hard to recognize CONTAINS at the right place, but I'm not familiar enough with Fortran to know exactly where it goes.
This issue is very annoying and it would be great to see it fixed. In the meantime I managed to create a very hacky workaround: In you CMakeLists.txt for the code that uses Fortran submodules, add the following lines:
where ${LIB} is the name of the library. Adjust the filenames according to which errors you are seeing. When using gfortran I got away with creating empty files using file(TOUCH), but with the Intel compiler I had to add some content to the file, hence file(WRITE) and "dummy".
EDIT: While this works for me most of the time it also fails from time to time. I'm not sure exactly what's causing the failures so use with care!