Macroprocessor

machineauthorcommitcommit dateplatformtimechecksumcheckrel timebonus

Your task is to implement a class representing the state of a macroprocessor with the following interface:

		template< typename policy>
		class macroprocessor {
		public:
			bool process(const std::string& s, std::string& output);
		};
	
The function process shall analyze the string s containing an input line (without any line delimiters).
If this line is a definition of a macro, the function shall save the definition in the private data of the macroprocessor class and return false.
Otherwise, the function shall expand all macros found in the input line (recursively) place the result in the string output and return true.

The class shall be a template parameterized with one of the policy classes describing the vectorization instruction set supported.
For this class, vectorization is probably useless, therefore, you may avoid some template-related problems by rewriting the interface as follows:

		class macroprocessor_impl {
		public:
			bool process(const std::string& s, std::string& output);
		};

		template< typename policy>
		class macroprocessor : public macroprocessor_impl {};
	

The macroprocessor class will not be copied or moved, i.e. copy/move support may be disabled.

Detailed specification

All input lines are composed of words, white-space, and (in the case of macro definition) the character '='. Words are sequences of letters, i.e. [a-zA-Z]+. White-space is a sequence of spaces, i.e. [ ]+. Adjacent words must be separated by white-space, in addition, there may be white-space at the start/end of a line and next to the '=' sign. Lines will never contain a newline character, nor any other character not mentioned above.

The output lines shall contain only words, separated by single spaces. (The correctness is verified by hashing the output lines.)

The definition of a macro is a line containing a '=' after the first word (optionally interleaved by whitespace). The first word is the name of the macro being defined, the (possibly empty) remaining sequence of words is the body of the macro. Macro names are case-significant. The same name may be used more than once, i.e. macro definitions may be changed (but not deleted).

Macro expansion, i.e. the replacement of macro names by macro bodies, is done only during the (recursive) processing of an input line which is not a macro definition. Macro definitions themselves must be stored without expansion, because a subsequent macro definition may change the expansion of a word in a macro body. The expansion process logically works with sequences of words, i.e. words are never glued together and the amount of white-space between them is irrelevant.

The inputs will never contain recursive macros, i.e. no detection of/protection against recursion is required.

The following input

  EF   =  GO  G GO
ABC =  abc EF def 
	  BFLM ABC D EF
	  G = GAGA 
      BFLM ABC D EF
  G =
	  BFLM ABC D EF
shall produce the following output
BFLM abc GO G GO def D GO G GO
BFLM abc GO GAGA GO def D GO GAGA GO
BFLM abc GO GO def D GO GO

Hints

The interface of std::[unordered_]map<K,T> contains the function

		iterator find( const K &);
	
When used as std::[unordered_]map<std::string,T>, it means that the sequence of characters being searched must be converted to std::string, invoking allocation and copying of the characters.
A more effective solution may be based on std::[unordered_]map<std::string_view,T>; however, the ownership of the character sequences (i.e. macro names) stored in the map must be solved somehow because string_view does not act as the owner.

Test parameters

nonmacro_weight indirectly sets the probability that a word is a macro name, iterated through the set { 3, 12, 48 } which roughly corresponds to probabilities { 0.5, 0.2, 0.06 }. Note that being a macro name does not necessarily mean that the macro was already defined because the macro definitions are randomly scattered among the lines; consequently, the true frequency of defined macros is lower, especially within the first lines. In Debug mode, only { 3, 48 } is used.

exp_width is the expected number of words in a line (or the right-hand side of a macro), iterated through the set { 3, 12 }. The numbers of words are randomly generated according to an exponential distribution which means that there is significant variability with both small and large outliers.

lines is an auto-adjusted parameter, determining the number of lines, i.e. calls to process. The range of the parameter is set so that the expected run time ranges from fractions of a second to seconds (stopped by the auto-adjustment mechanism after exceeding a second).

Testing

With the command-line parameter --dump=on, the testing framework will produce a number of input.*.txt and output.*.txt files in the current directory (a pair of files for each pair of the nonmacro_weight and exp_width parameters and for each thread). These files contain the first 1000 input lines generated by the framework and the corresponding output generated by your solution. (The files are generated before the actual performance measurement starts, the generator is restarted and the macroprocessor class recreated before the actual measurement.)

Some generated inputs and their reference outputs may be found here: macro-gold.zip