INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     spont
    0.43
     praktisch
    0.43
     Bud
    0.43
     Zero
    0.42
     Basically
    0.41
    Bud
    0.40
     bassa
    0.40
     cravings
    0.39
     Pneum
    0.39
     Nothing
    0.39
    POSITIVE LOGITS
    <0x80>
    0.55
    0.50
    0.48
    ajouter
    0.47
    EXE
    0.46
    0.46
    нский
    0.46
     ਅਤੇ
    0.45
    のかもし
    0.44
    Referências
    0.44
    Act Density 0.041%

    No Known Activations