INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     salvage
    -0.09
     insp
    -0.08
    lwa
    -0.07
     зад
    -0.07
    -0.07
    liness
    -0.07
    -0.07
     formule
    -0.07
    λύ
    -0.07
    -0.07
    POSITIVE LOGITS
     Dragon
    0.08
     Jerry
    0.08
     لتع
    0.07
     Debate
    0.07
     líder
    0.07
     Clause
    0.07
    Dragon
    0.07
     Leadership
    0.07
     ગણ
    0.07
     أبو
    0.07
    Act Density 0.002%

    No Known Activations