INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     homosexuals
    -0.07
     divor
    -0.06
    amente
    -0.06
    十三
    -0.06
    的地
    -0.06
    /ref
    -0.06
    orm
    -0.06
    ặc
    -0.06
    tro
    -0.06
     Carlson
    -0.06
    POSITIVE LOGITS
     need
    0.15
     needs
    0.14
     Need
    0.13
     needed
    0.12
    need
    0.12
    Need
    0.12
     NEED
    0.11
     Needs
    0.11
    needs
    0.10
    needed
    0.10
    Act Density 0.044%

    No Known Activations