INDEX
    Explanations

    sentences directed at or mentioning the listener

    New Auto-Interp
    Negative Logits
    ¥µ
    -0.81
    Ĥª
    -0.80
    ¿½
    -0.78
    ĺħ
    -0.76
    entimes
    -0.75
    ĸļ
    -0.74
    enges
    -0.73
    20439
    -0.72
    EStream
    -0.72
    ĨĴ
    -0.71
    POSITIVE LOGITS
     guys
    1.43
    're
    1.27
     yourselves
    1.27
     gentlemen
    1.02
    've
    1.00
     sir
    0.99
    tub
    0.91
     yourself
    0.89
    'll
    0.88
     bast
    0.87
    Act Density 0.147%

    No Known Activations