INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    reen
    -0.80
    moda
    -0.80
    oka
    -0.79
    сен
    -0.75
    фом
    -0.73
     Россий
    -0.72
     Hwang
    -0.72
     ngoài
    -0.71
     スカート
    -0.71
     privati
    -0.71
    POSITIVE LOGITS
     disdain
    2.25
     contempt
    2.09
     looked
    1.93
     condescending
    1.84
     bel
    1.80
    1.77
     look
    1.72
     down
    1.70
     despise
    1.69
     underestimate
    1.67
    Act Density 0.027%

    No Known Activations