INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     indywidual
    -0.48
     eenvoudig
    -0.45
     eenvou
    -0.43
     güçlü
    -0.41
    iertamente
    -0.41
     prakty
    -0.40
     przede
    -0.39
     fisik
    -0.39
     berbeda
    -0.39
     simplement
    -0.39
    POSITIVE LOGITS
     fucking
    0.93
     goddamn
    0.91
     hipster
    0.85
     fuckin
    0.85
     shitty
    0.79
     apocalypse
    0.79
     motherfucker
    0.78
     hilar
    0.78
    Cyfarwyddwr
    0.78
    fucking
    0.78
    Act Density 1.833%

    No Known Activations