INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     dich
    -0.08
     י
    -0.08
     forum
    -0.07
     sing
    -0.07
     conceived
    -0.07
     víct
    -0.07
     bri
    -0.07
     dang
    -0.07
     darn
    -0.07
    essoas
    -0.07
    POSITIVE LOGITS
     ก่อน
    0.09
    0.08
     Før
    0.08
     sebelum
    0.08
     nared
    0.08
     Nadal
    0.08
    _cov
    0.08
     Transformers
    0.08
    	before
    0.07
     trước
    0.07
    Act Density 0.002%

    No Known Activations