INDEX
    Explanations

    terms related to eating disorders

    instances of the word "ore."

    New Auto-Interp
    Negative Logits
     srf
    -0.80
    ulk
    -0.73
    cffff
    -0.73
    otaur
    -0.70
    ¥ŀ
    -0.68
    ilts
    -0.68
    arb
    -0.68
    itars
    -0.68
    insula
    -0.68
    ued
    -0.66
    POSITIVE LOGITS
    tto
    1.29
    gon
    1.10
    tsky
    1.08
    byss
    1.05
    lli
    1.05
    ttes
    1.04
    xia
    0.96
    nz
    0.95
    tta
    0.95
    cki
    0.90
    Act Density 0.020%

    No Known Activations