INDEX
    Explanations

    negative stereotypes, especially important

    New Auto-Interp
    Negative Logits
     radix
    0.48
     LE
    0.44
    LE
    0.43
     LEON
    0.41
    ARIES
    0.41
    ezers
    0.40
     рад
    0.40
    ERICK
    0.40
     rada
    0.38
    le
    0.38
    POSITIVE LOGITS
    idey
    0.40
    Ideally
    0.38
    hani
    0.38
    issant
    0.38
    единен
    0.37
     ideally
    0.36
    ровании
    0.36
     intimate
    0.36
    bey
    0.36
     виправи
    0.36
    Act Density 0.001%

    No Known Activations