INDEX
    Explanations

    abilities and qualities

    New Auto-Interp
    Negative Logits
     lightness
    0.53
     собой
    0.52
     fateful
    0.52
     debatable
    0.52
     drastic
    0.50
     frustrating
    0.50
     baseless
    0.50
     ominous
    0.49
    товый
    0.49
     слегка
    0.48
    POSITIVE LOGITS
     who
    0.98
     الذين
    0.95
     whose
    0.80
    who
    0.79
     cuyos
    0.78
     quienes
    0.77
     are
    0.75
     смогут
    0.74
     którzy
    0.73
     μπορούν
    0.72
    Act Density 0.021%

    No Known Activations