INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     posterior
    -0.08
     Torres
    -0.08
    Apple
    -0.07
     zaten
    -0.07
     فرو
    -0.07
    Turkey
    -0.07
     Athen
    -0.07
     зуб
    -0.07
    tar
    -0.07
     tom
    -0.06
    POSITIVE LOGITS
     kind
    0.11
     Kind
    0.09
     KIND
    0.09
    kind
    0.08
     grade
    0.08
    _KIND
    0.08
    ankind
    0.08
    .Kind
    0.08
     вид
    0.08
    Kind
    0.07
    Act Density 0.030%

    No Known Activations