INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.14
    1.01
     ощущение
    0.97
     llena
    0.93
     veliko
    0.90
     бонусы
    0.89
    ுகிற
    0.89
     ذریع
    0.89
     sociali
    0.88
     altri
    0.87
    POSITIVE LOGITS
    r
    1.22
    ש
    1.16
    ر
    1.14
    k
    1.09
    v
    1.04
    j
    1.01
    0.98
    0.98
    c
    0.98
    رر
    0.95
    Act Density 0.059%

    No Known Activations