Information Units

Information Units are the foundational components of EMOS that provide access to data, generation capabilities, and prediction models. They are designed with standardized interfaces to ensure consistency and enable easy integration within Features.

Overview

EMOS includes three types of Information Units:

  • Databases: Access to external materials databases and repositories

  • Generators: AI-powered tools for generating new materials and structures

  • Predictors: Machine learning models for property prediction and analysis

Each type follows a common design pattern with:

  • Standardized base classes

  • Factory-based instantiation

  • Consistent method interfaces

  • Error handling and logging

Design Principles

Standardized Interface

All Information Units implement standard base classes:

# Base classes define common interface
class BaseDatabase:
    def __init__(self, database_name, logger=None)
    def info(self) -> str
    def retrieve(self, inputs: dict) -> str

class BaseGenerator:
    def __init__(self, generator_name, logger=None)
    def info(self) -> str  
    def generate(self, inputs: dict) -> str

class BasePredictor:
    def __init__(self, predictor_name, logger=None)
    def info(self) -> str
    def predict(self, inputs: dict) -> str

Factory Pattern

Information Units are registered in factory dictionaries for dynamic instantiation:

# Factory dictionaries enable easy extension
database_factory = {
    "icsd": ICSD,
    "mp": MP,
    # Easy to add new databases
}

generator_factory = {
    "mattergen": MatterGen,
    "gnome": GNoME,
    # Easy to add new generators
}

predictor_factory = {
    "m3gnet": M3GNet,
    "mattersim": MatterSim,
    # Easy to add new predictors
}

Consistent Usage Pattern

All Information Units follow the same usage pattern within Features:

# Standard pattern for using Information Units
active_databases = inputs.get('active_databases', [])
for db_config in active_databases:
    db_key = db_config['value']
    if db_key in database_factory:
        db_instance = database_factory[db_key](db_key, self.logger)
        result = db_instance.retrieve(retrieve_inputs)

Adding New Information Units

The modular design makes it easy to add new Information Units:

  1. Implement Base Class: Create new class inheriting from appropriate base

  2. Register in Factory: Add to corresponding factory dictionary

  3. No Changes Required: Existing Features automatically discover new units

Example of adding a new database:

# 1. Implement the database
class NewDatabase(BaseDatabase):
    def info(self):
        return "NewDatabase: Description of capabilities"
    
    def retrieve(self, inputs):
        # Implementation here
        return results

# 2. Register in factory
database_factory["newdb"] = NewDatabase

# 3. Automatically available in all Features

This modular approach ensures that EMOS can easily grow and adapt to new tools and databases as they become available.