INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Freed
    0.45
     beneficios
    0.43
     ጥቅ
    0.42
     freed
    0.42
     képes
    0.41
    stabil
    0.41
     بهبود
    0.40
     pozitiv
    0.40
    Stabil
    0.40
    ^+$
    0.39
    POSITIVE LOGITS
     losing
    1.00
     defeat
    0.97
     losses
    0.96
     lose
    0.92
     humiliating
    0.89
     loss
    0.88
    0.87
     Losing
    0.86
    0.86
     loses
    0.85
    Act Density 0.417%

    No Known Activations