INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .country
    -0.07
    /S
    -0.07
     φ
    -0.07
    /P
    -0.07
     unser
    -0.07
    \Resource
    -0.06
     samo
    -0.06
    ÓN
    -0.06
     CD
    -0.06
    Nr
    -0.06
    POSITIVE LOGITS
     ocur
    0.09
    zyst
    0.07
     hack
    0.06
     disappointed
    0.06
    $mail
    0.06
    Este
    0.06
    ocities
    0.06
     shim
    0.06
     poisoned
    0.06
    くだ
    0.06
    Act Density 0.000%

    No Known Activations