INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    917
    -0.07
     rehe
    -0.07
     fraternity
    -0.06
    .Transaction
    -0.06
    043
    -0.06
     nueva
    -0.06
    .create
    -0.06
     panor
    -0.06
     sắp
    -0.06
     само
    -0.06
    POSITIVE LOGITS
     typ
    0.07
    PERSON
    0.07
     vandal
    0.06
    staw
    0.06
    0.06
    ,↵↵
    0.06
    Dep
    0.06
    ideo
    0.06
    0.06
    urve
    0.06
    Act Density 0.004%

    No Known Activations