INDEX
    Explanations

    phrases indicating guidance or instruction

    requests and directives for action or compliance

    New Auto-Interp
    Negative Logits
    )—
    -0.84
    "—
    -0.81
     Rodham
    -0.77
    ËĪ
    -0.72
    SAN
    -0.71
    \.
    -0.71
    —"
    -0.69
     slam
    -0.66
    AFP
    -0.66
    ickle
    -0.65
    POSITIVE LOGITS
    Additionally
    0.92
     furthermore
    0.88
    Example
    0.86
     Additionally
    0.85
     additionally
    0.85
     Example
    0.85
     drawback
    0.84
     Examples
    0.83
    cknowled
    0.81
     exceptions
    0.81
    Act Density 0.455%

    No Known Activations