INDEX
    Explanations

    dialogues that express conflict or personal experiences

    New Auto-Interp
    Negative Logits
    ÙĪÙĦا
    -0.14
    figur
    -0.14
     folks
    -0.14
    ughter
    -0.14
     upto
    -0.14
    getattr
    -0.14
     καθÏİÏĤ
    -0.14
    éĤ£äºĽ
    -0.13
    enis
    -0.13
    variably
    -0.13
    POSITIVE LOGITS
     always
    0.18
    always
    0.16
     maybe
    0.16
     inside
    0.15
     also
    0.15
     like
    0.15
     craz
    0.15
    228
    0.14
     USA
    0.14
    Inside
    0.14
    Act Density 0.108%

    No Known Activations