INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fresh
    -0.07
     native
    -0.07
    _StaticFields
    -0.07
     acceler
    -0.07
    atör
    -0.07
     Vect
    -0.07
     woman
    -0.07
     detained
    -0.07
     expres
    -0.07
     Native
    -0.06
    POSITIVE LOGITS
    истра
    0.07
    0.06
    ovation
    0.06
     trustworthy
    0.06
    Behavior
    0.06
    <this
    0.06
    <ID
    0.06
    oston
    0.06
    ije
    0.05
    throp
    0.05
    Act Density 0.068%

    No Known Activations