INDEX
    Explanations

    statements about actions and attributes of individuals, particularly focusing on their capabilities and experiences

    New Auto-Interp
    Negative Logits
     Blech
    -0.52
     luogo
    -0.49
    рс
    -0.49
    oad
    -0.49
    UVWXYZ
    -0.48
    Unione
    -0.48
     yourselves
    -0.48
     stalo
    -0.48
     başlad
    -0.47
    jíma
    -0.47
    POSITIVE LOGITS
     himself
    1.71
     Himself
    1.36
    himself
    1.35
     his
    1.13
    His
    0.94
    his
    0.93
     He
    0.93
     His
    0.92
    因为他
    0.90
    的他
    0.90
    Act Density 0.647%

    No Known Activations