INDEX
    Explanations

    negation and impossibility

    New Auto-Interp
    Negative Logits
     фактически
    0.74
     isom
    0.72
     ऑलरेडी
    0.71
     actual
    0.71
     strongly
    0.69
     entire
    0.69
     nonzero
    0.69
     effectivement
    0.69
     volled
    0.68
    めっちゃ
    0.68
    POSITIVE LOGITS
     forgetting
    1.05
     hesitation
    1.05
    forgettable
    0.98
     underestimates
    0.94
     забы
    0.94
     forgets
    0.91
     underestimate
    0.91
     shy
    0.91
     disappointments
    0.91
     doubt
    0.90
    Act Density 0.280%

    No Known Activations