INDEX
    Explanations

    quotes or phrases in a question-response format

    question marks and conversational cues indicating uncertainty or requests for confirmation

    New Auto-Interp
    Negative Logits
     multiplied
    -0.78
     fanc
    -0.77
     fleeing
    -0.77
     trave
    -0.75
     chants
    -0.75
     harassing
    -0.75
     pict
    -0.74
     stray
    -0.74
     migr
    -0.74
     forgotten
    -0.73
    POSITIVE LOGITS
    JM
    1.46
    JB
    1.41
    Answer
    1.37
    JV
    1.36
    RH
    1.32
    EH
    1.32
    MH
    1.29
    JP
    1.26
    DW
    1.26
    JS
    1.25
    Act Density 0.069%

    No Known Activations