INDEX
    Explanations

    phrases related to authority figures and social complaints

    New Auto-Interp
    Negative Logits
    obot
    -0.15
    apper
    -0.14
     majority
    -0.14
    InProgress
    -0.14
     kvin
    -0.14
    ód
    -0.14
    dogs
    -0.14
    TestCase
    -0.14
    adesh
    -0.14
    466
    -0.13
    POSITIVE LOGITS
    282
    0.15
    ussen
    0.14
    cka
    0.14
    imu
    0.14
     Morales
    0.14
    ượng
    0.14
     *@
    0.14
    ÑĥÑĢн
    0.14
    uentes
    0.14
    ikal
    0.13
    Act Density 0.491%

    No Known Activations