INDEX
    Explanations

    colons that often precede lists or detailed explanations

    New Auto-Interp
    Negative Logits
    asher
    -0.19
    odash
    -0.15
    ạc
    -0.15
    iyi
    -0.15
    ’na
    -0.14
    -binary
    -0.14
     thousand
    -0.14
    ÄĻk
    -0.13
    ÎŃ
    -0.13
    zcze
    -0.13
    POSITIVE LOGITS
    00
    0.49
    30
    0.42
    45
    0.34
    oop
    0.27
    oo
    0.27
    15
    0.26
    OO
    0.26
    05
    0.24
    pm
    0.23
    Û³Û°
    0.23
    Act Density 0.037%

    No Known Activations