Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing datafile definitions not caught #160

Open
trevorb1 opened this issue Apr 19, 2023 · 1 comment
Open

Missing datafile definitions not caught #160

trevorb1 opened this issue Apr 19, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@trevorb1
Copy link
Member

trevorb1 commented Apr 19, 2023

Description

When reading in a MathProg datafile, all parameter/set definitions in the datafile also need to be defined in the config.yaml file. If a definition is present in the datafile, but not in the config.yaml file, then an AmplyError is raised. This logic does not work in reverse.

If the config.yaml file has parameter definitions not present in the datafile, then I would expect a warning/error to be raised. Instead, the parameter is added to the internal datastore with the default value defined in the config.yaml file.

Other read strategies raise a OtooleNameMismatchError in these instances.

How to replicate

Remove the parameter AccumulatedAnnualDemand from a MathProg datafile, and ensure the config.yaml file has the definition:

AccumulatedAnnualDemand:
    indices: [REGION,FUEL,YEAR]
    type: param
    dtype: float
    default: 0

Thoughts on Solution

We use the config.yaml file to first determine what parameters to search for in the datafile, then pass that into the Amply object. Therefore, we either need to reformulate this logic, or change how amply deals with missing parameters.

class ReadDatafile(ReadStrategy):
def read(
self, filepath, **kwargs
) -> Tuple[Dict[str, pd.DataFrame], Dict[str, Any]]:
config = self.user_config
default_values = self._read_default_values(config)
amply_datafile = self.read_in_datafile(filepath, config)
inputs = self._convert_amply_to_dataframe(amply_datafile, config)
for config_type in ["param", "set"]:
inputs = self._get_missing_input_dataframes(inputs, config_type=config_type)
inputs = self._check_index(inputs)
return inputs, default_values
def read_in_datafile(self, path_to_datafile: str, config: Dict) -> Amply:
"""Read in a datafile using the Amply parsing class
Arguments
---------
path_to_datafile: str
config: Dict
"""
parameter_definitions = self._load_parameter_definitions(config)
datafile_parser = Amply(parameter_definitions)
with open(path_to_datafile, "r") as datafile:
datafile_parser.load_file(datafile)
return datafile_parser

Related issues/PR

This is an edge case of issue #151, with the rest of the issue addressed in PR #157.

@willu47
Copy link
Member

willu47 commented Apr 20, 2023

One option is to use a regex to parse the datafile for parameter and set definitions and then check these against the config file prior to reading in the data with the amply parser.

Something like this script can be used to extract lists of sets, parameters and variables from a file. There are significant performance issues though - this is likely to be slow on a large datafile.

import re

def parse_gmpl_code(gmpl_code):
    # Initialize the variables to store the sets, parameters, and variables
    sets = {}
    parameters = {}
    variables = {}

    # Define regular expressions to match the different GMPL components
    set_regex = re.compile(r'set\s+(?P<set_name>[^\s;]+)\s*;')
    param_regex = re.compile(r'param\s+(?P<param_name>[A-Za-z]+)\s*(?P<symbolic>symbolic)?(?P<indices>\s*\{[^\}]*\})?\s*(?P<default>default\s+[^;]+)?\s*(?P<binary>binary)?[;:=]')

    var_regex = re.compile(r'var\s+(?P<var_name>[^\s;,]+)(?P<indices>\s*\{[^\}]*\})?\s*(?P<bounds>>=\s*[^\s;]+)?\s*;')

    # Parse the sets
    for match in set_regex.finditer(gmpl_code):
        set_name = match.group('set_name')
        sets[set_name] = []

    # Parse the parameters
    for match in param_regex.finditer(gmpl_code):
        param_name = match.group('param_name')
        indices = match.group('indices')
        default = match.group('default')

        if indices:
            # Parse indices
            indices = re.findall(r'\{([^\}]*)\}', indices)[0]
            indices = [i.strip() for i in indices.split(',')]
            parameters[param_name] = {'indices': indices}
        else:
            parameters[param_name] = {}

        if default:
            # Parse default value
            default = default.strip().split()[-1]
            parameters[param_name]['default'] = default

    # Parse the variables
    for match in var_regex.finditer(gmpl_code):
        var_name = match.group('var_name')
        indices = match.group('indices')
        bounds = match.group('bounds')

        if indices:
            # Parse indices
            indices = re.findall(r'\{([^\}]*)\}', indices)[0]
            indices = [i.strip() for i in indices.split(',')]
            variables[var_name] = {'indices': indices}
        else:
            variables[var_name] = {}

        if bounds:
            # Parse variable bounds
            # bounds = bounds.strip().split()[-1]
            variables[var_name]['bounds'] = bounds

    # Return the parsed sets, parameters, and variables
    return sets, parameters, variables


with open('OSeMOSYS.txt', 'r') as textfile:
    osemosys = textfile.readlines()

sets, params, vars = parse_gmpl_code("".join(osemosys))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants