INDEX
    Explanations

    statements expressing opinion or commentary

    New Auto-Interp
    Negative Logits
    arez
    -0.71
    orse
    -0.68
    acca
    -0.68
    ortium
    -0.67
    ornia
    -0.67
    bes
    -0.65
    uala
    -0.65
    yna
    -0.64
    undrum
    -0.61
     Journalists
    -0.61
    POSITIVE LOGITS
     nonetheless
    1.37
    etheless
    1.22
     nevertheless
    1.16
     alas
    0.86
     beware
    0.84
     darn
    0.81
     anyways
    0.73
     damn
    0.73
     doesnt
    0.72
     prevailed
    0.71
    Act Density 0.354%

    No Known Activations