INDEX
    Explanations

    fractions written in the format of X/Y, with the numerator and denominator being single-digit numbers

    New Auto-Interp
    Negative Logits
     Gret
    -0.68
     bang
    -0.67
     Vul
    -0.66
     Merry
    -0.64
     hon
    -0.64
     obscene
    -0.63
     Chop
    -0.63
     Gavin
    -0.63
     Mir
    -0.62
     farewell
    -0.62
    POSITIVE LOGITS
    3
    0.94
    2
    0.91
    4
    0.87
    lvl
    0.86
    week
    0.85
    DAY
    0.84
    oct
    0.84
    division
    0.80
    5
    0.77
    OTT
    0.77
    Act Density 0.020%

    No Known Activations