INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _d
    -0.08
    -0.08
    _alloc
    -0.07
    _ad
    -0.07
     provocative
    -0.07
    CY
    -0.07
    info
    -0.07
     Dzięki
    -0.07
     forget
    -0.07
    _samples
    -0.07
    POSITIVE LOGITS
     concerne
    0.09
     nicely
    0.09
     begs
    0.09
     सवाल
    0.09
    네요
    0.09
     manutenção
    0.08
     सुझाव
    0.08
     fiquei
    0.08
     связано
    0.08
     needing
    0.08
    Act Density 0.077%

    No Known Activations