INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     soigne
    -0.08
    оза
    -0.08
    是在
    -0.08
     humains
    -0.08
    uels
    -0.08
     sociaux
    -0.08
     کی
    -0.08
     propios
    -0.08
    etho
    -0.08
     economists
    -0.08
    POSITIVE LOGITS
     સ્પ
    0.07
     subordinate
    0.07
     घूम
    0.07
     daughter
    0.07
    Sp
    0.07
     chords
    0.07
     spanning
    0.07
    _sp
    0.07
     multiplic
    0.07
     shapes
    0.07
    Act Density 0.094%

    No Known Activations