INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .
    0.58
    IRE
    0.55
    ك
    0.54
    1
    0.47
    ي
    0.47
    0.46
    eu
    0.45
    eur
    0.44
    Ik
    0.44
    ла
    0.44
    POSITIVE LOGITS
     in
    0.52
     took
    0.51
    zeitig
    0.48
     has
    0.47
    ttemberg
    0.46
    あれば
    0.45
     traitor
    0.45
     almighty
    0.45
     Одна
    0.44
    0.44
    Act Density 0.302%

    No Known Activations