INDEX
    Explanations

    instances of formatting or punctuation typically associated with lists or data entries

    New Auto-Interp
    Negative Logits
    ave
    -0.17
    ember
    -0.14
    ax
    -0.14
    ervers
    -0.14
     unthinkable
    -0.14
    224
    -0.14
    andel
    -0.14
    ame
    -0.14
    au
    -0.13
    boy
    -0.13
    POSITIVE LOGITS
    agit
    0.17
    hots
    0.15
    haar
    0.15
    points
    0.14
    hoot
    0.14
     Rebellion
    0.14
    дина
    0.13
    ìłIJ
    0.13
    _via
    0.13
    ’ya
    0.13
    Act Density 0.001%

    No Known Activations