INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    	va
    -0.08
    imd
    -0.08
    -va
    -0.08
    _Dep
    -0.08
    -0.08
     aér
    -0.07
     Ship
    -0.07
     sier
    -0.07
    imy
    -0.07
     doub
    -0.07
    POSITIVE LOGITS
     flush
    0.08
     belakang
    0.08
    flush
    0.08
     economía
    0.08
     Flush
    0.08
     გახ
    0.08
     strengthening
    0.08
     worsening
    0.08
    abilidade
    0.08
     rear
    0.08
    Act Density 0.002%

    No Known Activations