INDEX
    Explanations

    narratives involving escape and survival

    New Auto-Interp
    Negative Logits
     bottom
    -0.17
    bottom
    -0.16
    forcer
    -0.14
    iston
    -0.14
    asing
    -0.14
     Feed
    -0.14
     Ulus
    -0.14
    -bottom
    -0.14
    å·¡
    -0.14
    435
    -0.13
    POSITIVE LOGITS
     escape
    0.75
     Escape
    0.62
     escapes
    0.61
     escaping
    0.59
    escape
    0.59
    Escape
    0.57
     escaped
    0.57
    éĢĥ
    0.57
     flee
    0.56
     fleeing
    0.51
    Act Density 0.243%

    No Known Activations