INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    :
    1.05
    -
    0.93
    cribing
    0.80
     of
    0.80
    in
    0.79
    essage
    0.74
    ,
    0.74
    straction
    0.71
    aining
    0.69
    alers
    0.67
    POSITIVE LOGITS
    و
    0.85
     )}
    0.83
    к
    0.76
    на
    0.75
    ки
    0.74
     stillness
    0.73
    ل
    0.72
    0.71
    0.68
    т
    0.68
    Act Density 0.010%

    No Known Activations