INDEX
    Explanations

    references to reinforcement learning concepts

    New Auto-Interp
    Negative Logits
     GenerationType
    -0.70
    ftagPool
    -0.69
     AssemblyProduct
    -0.68
    AutoresizingMask
    -0.66
    onshire
    -0.57
     bezeichneter
    -0.55
     findViewById
    -0.55
    DebuggerNonUser
    -0.54
     Bride
    -0.51
    #+#
    -0.51
    POSITIVE LOGITS
     reward
    0.95
     policy
    0.88
     Reward
    0.87
     agent
    0.86
    Reward
    0.84
     rewards
    0.83
     Policy
    0.83
     env
    0.82
    reward
    0.81
     Agent
    0.81
    Act Density 0.299%

    No Known Activations