INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     imshow
    -0.07
     illnesses
    -0.07
     подраз
    -0.07
     напрям
    -0.07
     воду
    -0.06
     трен
    -0.06
     люд
    -0.06
     deniz
    -0.06
     bartender
    -0.06
    POSITIVE LOGITS
     Gothic
    0.15
     Goth
    0.10
    thic
    0.09
    oth
    0.07
    pta
    0.06
    เทพ
    0.06
     notch
    0.06
    |max
    0.06
    otti
    0.06
    itty
    0.06
    Act Density 0.001%

    No Known Activations