INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.06
     imassa
    1.03
     buddhav
    1.02
     astrophys
    0.98
     sofas
    0.98
     vibhav
    0.97
     Omphalodes
    0.96
     vijj
    0.95
     bandages
    0.94
     ziff
    0.93
    POSITIVE LOGITS
     כ
    1.73
     מ
    1.66
     ה
    1.62
     ש
    1.61
     ת
    1.55
    כ
    1.54
     ע
    1.52
     א
    1.50
     ל
    1.47
    ה
    1.47
    Act Density 0.005%

    No Known Activations