INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     neurons
    -0.07
     Erdoğan
    -0.07
    -0.06
     spend
    -0.06
     calming
    -0.06
     کمتر
    -0.06
    -0.06
     contagious
    -0.06
     extr
    -0.06
    araoh
    -0.06
    POSITIVE LOGITS
     Glob
    0.07
    tg
    0.07
    �璃
    0.07
     schizophren
    0.07
    bab
    0.06
    Tem
    0.06
    (';
    0.06
     gentle
    0.06
    งค
    0.06
    (tk
    0.06
    Act Density 0.001%

    No Known Activations