INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    is
    0.99
    ot
    0.95
    en
    0.94
    و
    0.91
    at
    0.89
    er
    0.81
    el
    0.80
    us
    0.80
    ic
    0.79
    os
    0.79
    POSITIVE LOGITS
    .。
    0.67
    ٫
    0.66
    .
    0.65
    .
    0.65
    Algun
    0.65
    0.63
    ljena
    0.60
    .{
    0.60
    ке
    0.59
    アニメ
    0.59
    Act Density 0.092%

    No Known Activations