INDEX
    Explanations

    pronouns and verbs referring to actions performed by individuals

    references to specific individuals or characters

    New Auto-Interp
    Negative Logits
     è£ıè
    -0.76
    DAY
    -0.74
    âĦ¢:
    -0.68
    grave
    -0.68
     Alas
    -0.67
    ãĥ¤
    -0.65
     Sao
    -0.63
    ĵĺ
    -0.62
     Unicorn
    -0.62
    ielding
    -0.59
    POSITIVE LOGITS
    zbollah
    1.14
    'll
    1.08
    're
    1.08
     [
    1.05
     ain
    1.04
     gotta
    0.99
     didn
    0.97
     got
    0.97
    've
    0.94
     mathemat
    0.94
    Act Density 0.218%

    No Known Activations