INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Gir
    -0.08
     realizar
    -0.08
     futuristic
    -0.07
     nade
    -0.07
    engen
    -0.07
     upgrade
    -0.07
     sadrž
    -0.07
     humanities
    -0.07
    -0.07
     monetize
    -0.07
    POSITIVE LOGITS
    0.08
     lum
    0.08
    lum
    0.08
     besproken
    0.07
    .Scope
    0.07
     bowls
    0.07
     канц
    0.07
    rice
    0.07
     arguing
    0.07
     Thanksgiving
    0.07
    Act Density 0.003%

    No Known Activations