INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lose
    -1.98
     losing
    -1.69
    losing
    -1.56
     loosing
    -1.55
    Lose
    -1.52
     Losing
    -1.48
     loses
    -1.46
     loose
    -1.45
     perdre
    -1.39
    Losing
    -1.35
    POSITIVE LOGITS
    s
    0.75
     a
    0.74
     the
    0.68
     interest
    0.65
     it
    0.60
     interests
    0.59
     this
    0.59
     an
    0.57
     more
    0.57
     its
    0.57
    Act Density 0.026%

    No Known Activations