INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     بودند
    0.44
    Ф
    0.44
    是一些
    0.43
    Φ
    0.42
     شدند
    0.41
     आहेत
    0.39
     هستند
    0.39
    Ш
    0.38
     असतील
    0.37
     közül
    0.37
    POSITIVE LOGITS
     incapable
    0.55
     obsessed
    0.50
    doesn
    0.49
     goes
    0.46
     unreliable
    0.46
     doesn
    0.45
     afraid
    0.45
     unworthy
    0.44
     owns
    0.44
     unable
    0.43
    Act Density 0.013%

    No Known Activations