INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    or
    1.03
    ка
    1.00
    w
    0.88
    /
    0.88
    т
    0.85
    of
    0.85
    to
    0.84
    y
    0.84
    as
    0.83
    during
    0.83
    POSITIVE LOGITS
     gotta
    1.91
     been
    1.84
     truths
    1.70
     mantras
    1.49
     BEEN
    1.42
     gonna
    1.41
     Been
    1.40
     perks
    1.40
     axioms
    1.39
     clichés
    1.37
    Act Density 0.017%

    No Known Activations