INDEX
    Explanations

    strong emotional states

    New Auto-Interp
    Negative Logits
    !
    1.65
    1.65
    !\
    1.64
    ..!
    1.64
    !“
    1.63
    ...!
    1.63
    !".
    1.61
    !",
    1.61
    !";
    1.60
    !"
    1.60
    POSITIVE LOGITS
     fucking
    1.03
    Yeah
    0.96
     fucked
    0.95
     Yeah
    0.93
    においては
    0.88
    Fuck
    0.84
     fuck
    0.82
    に関しては
    0.80
     באופן
    0.79
     Fuck
    0.78
    Act Density 0.048%

    No Known Activations