INDEX
    Explanations

    instances of dialogue or direct speech

    New Auto-Interp
    Negative Logits
     Oops
    -0.18
    icher
    -0.14
    omen
    -0.14
    _HINT
    -0.13
     Hoover
    -0.13
    ëį°ìĿ´íĬ¸
    -0.13
    Damn
    -0.13
    lickr
    -0.13
    Funny
    -0.13
    lÃŃ
    -0.13
    POSITIVE LOGITS
    agree
    0.23
     amen
    0.23
     Amen
    0.22
     agreed
    0.21
     agree
    0.21
    amen
    0.19
    /ag
    0.19
     Agree
    0.19
     agrees
    0.17
     exactly
    0.16
    Act Density 0.134%

    No Known Activations