INDEX
    Explanations

    phrases indicating actions related to personal experiences or feelings

    New Auto-Interp
    Negative Logits
    ſelf
    -0.74
     Anſ
    -0.68
     itſelf
    -0.65
     iſt
    -0.65
     Eſ
    -0.64
    ſelves
    -0.63
     Diſ
    -0.63
     Theſe
    -0.62
     Efq
    -0.61
     poffe
    -0.61
    POSITIVE LOGITS
     decided
    0.98
     décide
    0.79
     opted
    0.79
     chose
    0.74
     решили
    0.74
     began
    0.74
     decidió
    0.73
     went
    0.72
    特意
    0.70
     решила
    0.70
    Act Density 0.453%

    No Known Activations