INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.85
     jengibre
    -0.85
    -0.83
     vaikka
    -0.82
     mikina
    -0.81
     menjawab
    -0.81
    ี้ยง
    -0.79
     kekerasan
    -0.79
     bayi
    -0.78
     kualitas
    -0.78
    POSITIVE LOGITS
    ,
    0.80
    образ
    0.79
     الأعلى
    0.78
     util
    0.77
     خصوص
    0.73
     الأمر
    0.72
     Werke
    0.71
    kości
    0.70
     Jal
    0.69
     которой
    0.68
    Act Density 0.000%

    No Known Activations