INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    acebook
    -0.08
    LOOR
    -0.08
    LOAT
    -0.07
     Naked
    -0.07
     Nick
    -0.07
    amiliar
    -0.07
    riendly
    -0.07
    -over
    -0.07
    ällt
    -0.06
    IELDS
    -0.06
    POSITIVE LOGITS
     F
    0.13
     f
    0.11
    .F
    0.11
    f
    0.11
    Fu
    0.11
     Fang
    0.10
     FI
    0.10
    ,f
    0.10
    :F
    0.10
    _f
    0.10
    Act Density 0.444%

    No Known Activations