INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    åt
    -0.08
    oris
    -0.08
    nable
    -0.08
     hydrated
    -0.08
     ব্যবস্থা
    -0.08
    skom
    -0.08
     систему
    -0.07
    truck
    -0.07
    fors
    -0.07
     behaving
    -0.07
    POSITIVE LOGITS
    0.08
     भाग
    0.07
    शी
    0.07
    Dean
    0.07
    Fort
    0.07
     Robertson
    0.07
    Bra
    0.07
    Keith
    0.07
     सार
    0.07
     해서
    0.07
    Act Density 0.007%

    No Known Activations