INDEX
    Explanations

    praise or criticism for individuals' performances

    references to notable individuals and their impact or behavior

    New Auto-Interp
    Negative Logits
    etheless
    -0.95
    ''.
    -0.74
    ":{"
    -0.73
    "))
    -0.70
    )))
    -0.70
    "!
    -0.68
    ]).
    -0.66
    "?
    -0.65
     attRot
    -0.64
    )).
    -0.63
    POSITIVE LOGITS
     whereas
    0.62
     averaging
    0.57
     replaced
    0.55
     paired
    0.55
     overhead
    0.53
     kios
    0.51
     upfront
    0.51
     satur
    0.50
     reps
    0.50
     VM
    0.49
    Act Density 2.391%

    No Known Activations