INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     males
    -0.07
     sich
    -0.07
     humility
    -0.07
    ’de
    -0.06
     fear
    -0.06
     fare
    -0.06
     centroid
    -0.06
     Peak
    -0.06
     dusty
    -0.06
     Lak
    -0.06
    POSITIVE LOGITS
     revolution
    0.09
     Revolution
    0.08
    177
    0.08
     Rebel
    0.07
    revolution
    0.07
    (convert
    0.06
    Liquid
    0.06
     robot
    0.06
    WebRequest
    0.06
    trl
    0.06
    Act Density 0.014%

    No Known Activations