Missing datafile definitions not caught #160

trevorb1 · 2023-04-19T23:05:46Z

Description

When reading in a MathProg datafile, all parameter/set definitions in the datafile also need to be defined in the config.yaml file. If a definition is present in the datafile, but not in the config.yaml file, then an AmplyError is raised. This logic does not work in reverse.

If the config.yaml file has parameter definitions not present in the datafile, then I would expect a warning/error to be raised. Instead, the parameter is added to the internal datastore with the default value defined in the config.yaml file.

Other read strategies raise a OtooleNameMismatchError in these instances.

How to replicate

Remove the parameter AccumulatedAnnualDemand from a MathProg datafile, and ensure the config.yaml file has the definition:

AccumulatedAnnualDemand:
    indices: [REGION,FUEL,YEAR]
    type: param
    dtype: float
    default: 0

Thoughts on Solution

We use the config.yaml file to first determine what parameters to search for in the datafile, then pass that into the Amply object. Therefore, we either need to reformulate this logic, or change how amply deals with missing parameters.

otoole/src/otoole/read_strategies.py

Lines 297 to 325 in 3c6f04e

    
           class ReadDatafile(ReadStrategy): 
        
               def read( 
        
                   self, filepath, **kwargs 
        
               ) -> Tuple[Dict[str, pd.DataFrame], Dict[str, Any]]: 
        
                   config = self.user_config 
        
                   default_values = self._read_default_values(config) 
        
                   amply_datafile = self.read_in_datafile(filepath, config) 
        
                   inputs = self._convert_amply_to_dataframe(amply_datafile, config) 
        
                   for config_type in ["param", "set"]: 
        
                       inputs = self._get_missing_input_dataframes(inputs, config_type=config_type) 
        
                   inputs = self._check_index(inputs) 
        
                   return inputs, default_values 
        
               def read_in_datafile(self, path_to_datafile: str, config: Dict) -> Amply: 
        
                   """Read in a datafile using the Amply parsing class 
        
                   Arguments 
        
                   --------- 
        
                   path_to_datafile: str 
        
                   config: Dict 
        
                   """ 
        
                   parameter_definitions = self._load_parameter_definitions(config) 
        
                   datafile_parser = Amply(parameter_definitions) 
        
                   with open(path_to_datafile, "r") as datafile: 
        
                       datafile_parser.load_file(datafile) 
        
                   return datafile_parser

Related issues/PR

This is an edge case of issue #151, with the rest of the issue addressed in PR #157.

The text was updated successfully, but these errors were encountered:

willu47 · 2023-04-20T07:19:29Z

One option is to use a regex to parse the datafile for parameter and set definitions and then check these against the config file prior to reading in the data with the amply parser.

Something like this script can be used to extract lists of sets, parameters and variables from a file. There are significant performance issues though - this is likely to be slow on a large datafile.

import re

def parse_gmpl_code(gmpl_code):
    # Initialize the variables to store the sets, parameters, and variables
    sets = {}
    parameters = {}
    variables = {}

    # Define regular expressions to match the different GMPL components
    set_regex = re.compile(r'set\s+(?P<set_name>[^\s;]+)\s*;')
    param_regex = re.compile(r'param\s+(?P<param_name>[A-Za-z]+)\s*(?P<symbolic>symbolic)?(?P<indices>\s*\{[^\}]*\})?\s*(?P<default>default\s+[^;]+)?\s*(?P<binary>binary)?[;:=]')

    var_regex = re.compile(r'var\s+(?P<var_name>[^\s;,]+)(?P<indices>\s*\{[^\}]*\})?\s*(?P<bounds>>=\s*[^\s;]+)?\s*;')

    # Parse the sets
    for match in set_regex.finditer(gmpl_code):
        set_name = match.group('set_name')
        sets[set_name] = []

    # Parse the parameters
    for match in param_regex.finditer(gmpl_code):
        param_name = match.group('param_name')
        indices = match.group('indices')
        default = match.group('default')

        if indices:
            # Parse indices
            indices = re.findall(r'\{([^\}]*)\}', indices)[0]
            indices = [i.strip() for i in indices.split(',')]
            parameters[param_name] = {'indices': indices}
        else:
            parameters[param_name] = {}

        if default:
            # Parse default value
            default = default.strip().split()[-1]
            parameters[param_name]['default'] = default

    # Parse the variables
    for match in var_regex.finditer(gmpl_code):
        var_name = match.group('var_name')
        indices = match.group('indices')
        bounds = match.group('bounds')

        if indices:
            # Parse indices
            indices = re.findall(r'\{([^\}]*)\}', indices)[0]
            indices = [i.strip() for i in indices.split(',')]
            variables[var_name] = {'indices': indices}
        else:
            variables[var_name] = {}

        if bounds:
            # Parse variable bounds
            # bounds = bounds.strip().split()[-1]
            variables[var_name]['bounds'] = bounds

    # Return the parsed sets, parameters, and variables
    return sets, parameters, variables


with open('OSeMOSYS.txt', 'r') as textfile:
    osemosys = textfile.readlines()

sets, params, vars = parse_gmpl_code("".join(osemosys))

trevorb1 added the bug Something isn't working label Apr 19, 2023

trevorb1 mentioned this issue Apr 19, 2023

Config file not checking data defenitions from MathProg format #151

Closed

trevorb1 mentioned this issue Jun 21, 2023

Using old data files with Otoole #179

Open

willu47 mentioned this issue Jun 21, 2023

Validation of config.yaml against datafile #182

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing datafile definitions not caught #160

Missing datafile definitions not caught #160

trevorb1 commented Apr 19, 2023 •

edited

Loading

willu47 commented Apr 20, 2023

Missing datafile definitions not caught #160

Missing datafile definitions not caught #160

Comments

trevorb1 commented Apr 19, 2023 • edited Loading

Description

How to replicate

Thoughts on Solution

Related issues/PR

willu47 commented Apr 20, 2023

trevorb1 commented Apr 19, 2023 •

edited

Loading