INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    帖最后由
    -0.79
     étoient
    -0.73
    ScopeManager
    -0.71
     springfox
    -0.69
     wikipagina
    -0.69
    styleType
    -0.69
     ujednoznacz
    -0.68
     houſe
    -0.66
    LLocation
    -0.66
     تضيفلها
    -0.65
    POSITIVE LOGITS
     always
    0.76
     never
    0.72
     really
    0.69
     just
    0.68
     love
    0.65
     think
    0.60
     often
    0.58
     strongly
    0.56
     usually
    0.56
     not
    0.55
    Act Density 1.965%

    No Known Activations