INDEX
    Explanations

    words that convey specific numerical quantities or measurements

    New Auto-Interp
    Negative Logits
    ellan
    -0.16
    ringe
    -0.15
    thur
    -0.15
    بط
    -0.15
    ollen
    -0.15
    prak
    -0.14
    Rated
    -0.14
     apl
    -0.13
     Gu
    -0.13
    adelphia
    -0.13
    POSITIVE LOGITS
    ÅĻik
    0.16
    olin
    0.15
    708
    0.15
    ÃŃm
    0.15
     setters
    0.14
    olib
    0.14
    stav
    0.14
    DMI
    0.14
    _bulk
    0.14
    /target
    0.14
    Act Density 0.021%

    No Known Activations