INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sorrow
    -0.07
     Peyton
    -0.07
     Ping
    -0.07
     merry
    -0.06
    .timer
    -0.06
    pekt
    -0.06
     Murray
    -0.06
     contradiction
    -0.06
     chronological
    -0.06
     dart
    -0.06
    POSITIVE LOGITS
     base
    0.14
    Base
    0.12
     Base
    0.12
    base
    0.12
    _base
    0.11
    ase
    0.11
    /base
    0.11
    _BASE
    0.11
     bases
    0.10
    BASE
    0.10
    Act Density 0.118%

    No Known Activations