INDEX
    Explanations

    phrases related to personal experiences and emotional expressions

    New Auto-Interp
    Negative Logits
     darn
    -0.70
     Everybody
    -0.67
    Everybody
    -0.66
     everybody
    -0.65
    damn
    -0.62
     damn
    -0.61
    Gimme
    -0.59
     referrerpolicy
    -0.56
    ########.
    -0.55
    -0.55
    POSITIVE LOGITS
     myſelf
    0.69
     kleid
    0.67
    AxisAlignment
    0.67
     edelstahl
    0.67
     zoude
    0.65
    DebuggerNonUser
    0.64
     män
    0.62
    Дан
    0.61
    /−
    0.60
     genieten
    0.60
    Act Density 0.950%

    No Known Activations