INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     waterfront
    -0.09
    ':
    -0.09
    ':'
    -0.08
     kern
    -0.08
    :'/
    -0.08
     vistas
    -0.08
     crianças
    -0.07
     сучас
    -0.07
    vp
    -0.07
     wirkt
    -0.07
    POSITIVE LOGITS
     manually
    0.11
     mentally
    0.09
    _manual
    0.09
    Manual
    0.08
     استخراج
    0.08
     manual
    0.08
     जानते
    0.08
     मेहन
    0.08
     직접
    0.08
     surely
    0.08
    Act Density 0.013%

    No Known Activations