INDEX
    Explanations

    phrases related to communication and storytelling

    New Auto-Interp
    Negative Logits
     I
    -0.67
     we
    -0.65
     나는
    -0.60
     मैंने
    -0.52
    我自己
    -0.50
     मैं
    -0.50
    me
    -0.48
     Myself
    -0.47
    僕が
    -0.47
    we
    -0.47
    POSITIVE LOGITS
     us
    2.59
     нас
    1.39
     Us
    1.23
    us
    1.12
    Us
    1.11
     nás
    0.98
     нам
    0.90
     US
    0.81
    让我们
    0.77
     לנו
    0.76
    Act Density 0.261%

    No Known Activations