INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ocial
    -0.06
    group
    -0.06
     weren
    -0.06
    -local
    -0.06
    -0.06
     expressions
    -0.06
    /people
    -0.06
     revolutions
    -0.06
    	if
    -0.06
    're
    -0.06
    POSITIVE LOGITS
    ितन
    0.07
     dolayı
    0.07
     dětí
    0.07
    ída
    0.06
    üssen
    0.06
     Fransız
    0.06
    eníze
    0.06
     있다는
    0.06
     dispar
    0.06
     الفر
    0.06
    Act Density 0.001%

    No Known Activations