INDEX
    Explanations

    Foreign language questions/greetings

    New Auto-Interp
    Negative Logits
     Gow
    -0.06
    ời
    -0.06
    Kitchen
    -0.06
     зовніш
    -0.06
     bear
    -0.06
    ็อก
    -0.06
     scant
    -0.06
    ласти
    -0.05
     tối
    -0.05
     torso
    -0.05
    POSITIVE LOGITS
     Monitoring
    0.07
    TM
    0.07
        
    0.07
     punitive
    0.07
    कर
    0.07
     окрем
    0.07
    292
    0.06
    _mag
    0.06
     تنظيف
    0.06
    255
    0.06
    Act Density 0.001%

    No Known Activations