INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     slaying
    0.63
    ی
    0.60
    の場合は
    0.59
    s
    0.58
     сумка
    0.55
    ي
    0.54
     сильнее
    0.53
     했습니다
    0.52
     нередко
    0.52
    ע
    0.52
    POSITIVE LOGITS
     cool
    0.91
     coolness
    0.84
    cool
    0.76
     coolest
    0.71
     😎
    0.64
     Cool
    0.62
     trendy
    0.59
    😎
    0.59
     increí
    0.57
    0.57
    Act Density 0.018%

    No Known Activations