INDEX
    Explanations

    instances of high-stakes decision-making or options in games or simulations

    New Auto-Interp
    Negative Logits
    ipay
    -0.15
    .ali
    -0.15
    bilt
    -0.15
    bis
    -0.15
    enÃŃ
    -0.15
    rey
    -0.15
    åĪ
    -0.14
    emme
    -0.14
    OLON
    -0.14
    eras
    -0.14
    POSITIVE LOGITS
     g
    0.18
     b
    0.17
    xe
    0.17
     Bd
    0.17
     Kh
    0.17
    White
    0.17
     followed
    0.17
     White
    0.16
     h
    0.16
     Be
    0.16
    Act Density 0.000%

    No Known Activations