INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     děti
    -0.81
    -0.79
    enschappelijke
    -0.78
    palabras
    -0.78
    راحة
    -0.78
     pracuje
    -0.78
     Differenz
    -0.77
    ملك
    -0.76
    костюм
    -0.75
     Inhalation
    -0.75
    POSITIVE LOGITS
    toArray
    0.77
    Nodes
    0.75
    hour
    0.74
    Std
    0.73
     manifesto
    0.71
     hour
    0.71
    Apple
    0.69
     gres
    0.68
     мо
    0.68
    یه
    0.68
    Act Density 0.010%

    No Known Activations