INDEX
    Explanations

    phrases related to expressing opinions or giving speeches

    New Auto-Interp
    Negative Logits
    Written
    -0.71
     WATCHED
    -0.69
     accessed
    -0.65
    >>>>>>>>
    -0.64
     Modified
    -0.63
    idav
    -0.60
     Rollins
    -0.59
    nikov
    -0.59
     Edited
    -0.59
    cream
    -0.58
    POSITIVE LOGITS
    irlf
    0.68
    ilk
    0.63
    order
    0.61
    utical
    0.61
     predecessors
    0.61
    sum
    0.61
    romy
    0.60
     govern
    0.60
     stride
    0.59
    ngth
    0.59
    Act Density 0.272%

    No Known Activations