Memories

    • class EpisodicExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int] = (, 1000000), n_step=-1, train_to_eval_ratio: int = 1)[source]
      • Parameters
      • max_size – the maximum number of transitions or episodes to hold in the memory

    EpisodicHindsightExperienceReplay

    • class rlcoach.memories.episodic.EpisodicHindsightExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)
    • Implements Hindsight Experience Replay as described in the following paper:

      • Parameters
        • hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generatefor each actual transition

        • hindsight_goal_selection_method – The method that will be used for generating the goals for thehindsight transitions. Should be one of HindsightGoalSelectionMethod

        • goals_space – A GoalsSpace which defines the base properties of the goals space

    EpisodicHRLHindsightExperienceReplay

    • class rlcoach.memories.episodic.EpisodicHRLHindsightExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)
    • Implements HRL Hindsight Experience Replay as described in the following paper:

    This is the memory you should use if you want a shared hindsight experience replay buffer between multiple workers

    • Parameters
      • max_size – The maximum size of the memory. should be defined in a granularity of Transitions

      • hindsight_goal_selection_method – The method that will be used for generating the goals for thehindsight transitions. Should be one of HindsightGoalSelectionMethod

      • goals_space – A GoalsSpace which defines the properties of the goals

      • do_action_hindsight – Replace the action (sub-goal) given to a lower layer, with the actual achieved goal

    Non-Episodic Memories

    BalancedExperienceReplay

    • class rlcoach.memories.non_episodic.BalancedExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True, num_classes: int = 0, state_key_with_the_class_index: Any = 'class')[source]
      • Parameters
        • max_size – the maximum number of transitions or episodes to hold in the memory

        • num_classes – the number of classes in the replayed data

        • state_key_with_the_class_index – the class index is assumed to be a value in the state dictionary.this parameter determines the key to retrieve the class index value

    QDND

    • class rlcoach.memories.non_episodic.QDND(_dict_size, key_width, num_actions, new_value_shift_coefficient=0.1, key_error_threshold=0.01, learning_rate=0.01, num_neighbors=50, return_additional_data=False, override_existing_keys=False, rebuild_on_every_update=False)
    • class ExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True)
      • Parameters
        • max_size – the maximum number of transitions or episodes to hold in the memory

        • allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

    PrioritizedExperienceReplay

    • class rlcoach.memories.non_episodic.PrioritizedExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], alpha: float = 0.6, beta: rl_coach.schedules.Schedule = , epsilon: float = 1e-06, allow_duplicates_in_batch_sampling: bool = True)[source]
    • This is the proportional sampling variant of the prioritized experience replay as describedin https://arxiv.org/pdf/1511.05952.pdf.

      • Parameters
        • max_size – the maximum number of transitions or episodes to hold in the memory

        • alpha – the alpha prioritization coefficient

        • beta – the beta parameter used for importance sampling

        • epsilon – a small value added to the priority of each transition

    TransitionCollection

    • Simple python implementation of transitions collection non-episodic memoriesare constructed on top of.