INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ל
    1.38
    л
    1.01
    нг
    0.95
    ד
    0.91
    لي
    0.87
    ע
    0.84
     in
    0.83
    с
    0.81
     amar
    0.80
     alli
    0.79
    POSITIVE LOGITS
    ilers
    0.98
    ose
    0.95
    urt
    0.95
     Biden
    0.95
    US
    0.93
    ą
    0.88
    ous
    0.87
    ings
    0.86
    ial
    0.83
    ess
    0.83
    Act Density 0.003%

    No Known Activations