INDEX
    Explanations

    sentences that convey conditional reasoning or personal insights

    New Auto-Interp
    Negative Logits
     skall
    -0.78
     muß
    -0.75
     läßt
    -0.74
    ・・・・・
    -0.63
     denominado
    -0.61
    であるが
    -0.58
     lecz
    -0.57
     mußte
    -0.57
     yoktur
    -0.56
     آنان
    -0.56
    POSITIVE LOGITS
     shitty
    1.08
     tryna
    1.08
     tbh
    1.07
     weirdly
    1.06
     whatnot
    1.04
     idk
    1.04
     fucked
    1.02
     goddamn
    1.00
     lemme
    0.98
     kinda
    0.98
    Act Density 1.698%

    No Known Activations