INDEX
    Explanations

    quotes attributed to a male speaker

    pronouns and their associated actions or references

    New Auto-Interp
    Negative Logits
    pired
    -0.86
    Had
    -0.85
    tained
    -0.80
    were
    -0.76
     Been
    -0.73
    Were
    -0.70
    ayed
    -0.69
    Offline
    -0.67
     guiActiveUnfocused
    -0.66
     mattered
    -0.65
    POSITIVE LOGITS
     asks
    1.53
     complains
    1.50
     concludes
    1.48
     agrees
    1.47
     discovers
    1.45
     begins
    1.44
     warns
    1.44
     decides
    1.44
     realizes
    1.43
     introduces
    1.43
    Act Density 0.455%

    No Known Activations