Memories

class EpisodicExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int] = (, 1000000), n_step=-1, train_to_eval_ratio: int = 1)[source]
- Parameters
- max_size – the maximum number of transitions or episodes to hold in the memory

class rlcoach.memories.episodic.EpisodicHindsightExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)
Implements Hindsight Experience Replay as described in the following paper:
- Parameters
- - hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generatefor each actual transition
  - hindsight_goal_selection_method – The method that will be used for generating the goals for thehindsight transitions. Should be one of HindsightGoalSelectionMethod
  - goals_space – A GoalsSpace which defines the base properties of the goals space

class rlcoach.memories.episodic.EpisodicHRLHindsightExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)
Implements HRL Hindsight Experience Replay as described in the following paper:

This is the memory you should use if you want a shared hindsight experience replay buffer between multiple workers

Parameters
- max_size – The maximum size of the memory. should be defined in a granularity of Transitions
- hindsight_goal_selection_method – The method that will be used for generating the goals for thehindsight transitions. Should be one of HindsightGoalSelectionMethod
- goals_space – A GoalsSpace which defines the properties of the goals
- do_action_hindsight – Replace the action (sub-goal) given to a lower layer, with the actual achieved goal

Non-Episodic Memories

class rlcoach.memories.non_episodic.BalancedExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True, num_classes: int = 0, state_key_with_the_class_index: Any = 'class')[source]
- Parameters
- - max_size – the maximum number of transitions or episodes to hold in the memory
  - num_classes – the number of classes in the replayed data
  - state_key_with_the_class_index – the class index is assumed to be a value in the state dictionary.this parameter determines the key to retrieve the class index value

class rlcoach.memories.non_episodic.QDND(_dict_size, key_width, num_actions, new_value_shift_coefficient=0.1, key_error_threshold=0.01, learning_rate=0.01, num_neighbors=50, return_additional_data=False, override_existing_keys=False, rebuild_on_every_update=False)

class ExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True)
- Parameters
- - max_size – the maximum number of transitions or episodes to hold in the memory
  - allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

class rlcoach.memories.non_episodic.PrioritizedExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], alpha: float = 0.6, beta: rl_coach.schedules.Schedule = , epsilon: float = 1e-06, allow_duplicates_in_batch_sampling: bool = True)[source]
This is the proportional sampling variant of the prioritized experience replay as describedin https://arxiv.org/pdf/1511.05952.pdf.
- Parameters
- - max_size – the maximum number of transitions or episodes to hold in the memory
  - alpha – the alpha prioritization coefficient
  - beta – the beta parameter used for importance sampling
  - epsilon – a small value added to the priority of each transition

Simple python implementation of transitions collection non-episodic memoriesare constructed on top of.