INDEX
    Explanations

    informational phrases prompting action

    requests for additional information

    New Auto-Interp
    Negative Logits
     neighb
    -0.68
    artifacts
    -0.63
     lifeless
    -0.61
    meter
    -0.61
    opped
    -0.61
    anmar
    -0.60
    lihood
    -0.60
     impro
    -0.58
     throats
    -0.56
    odied
    -0.55
    POSITIVE LOGITS
     about
    1.18
     regarding
    1.15
     ABOUT
    1.02
     pertaining
    0.98
     concerning
    0.96
     About
    0.93
    About
    0.93
    about
    0.85
     Regarding
    0.83
     relating
    0.80
    Act Density 0.056%

    No Known Activations