INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    EMPTY
    -0.07
     Copy
    -0.07
    -0.07
     умов
    -0.06
     kindergarten
    -0.06
     состав
    -0.06
     reforms
    -0.06
     그러
    -0.06
    ंभ
    -0.06
     british
    -0.06
    POSITIVE LOGITS
    #'
    0.07
     Ging
    0.07
    .cost
    0.06
    itchen
    0.06
    requete
    0.06
     люд
    0.06
    sand
    0.06
     Jakarta
    0.06
     hinted
    0.06
     Ост
    0.06
    Act Density 0.158%

    No Known Activations