INDEX
    Explanations

    phrases indicating contrast or emphasis

    phrases that express limitation or negation

    New Auto-Interp
    Negative Logits
    osion
    -0.72
    cies
    -0.69
    insula
    -0.65
     VIDEOS
    -0.62
    ctors
    -0.60
     Increases
    -0.59
    ortmund
    -0.59
     Blaz
    -0.58
    soType
    -0.58
     UW
    -0.57
    POSITIVE LOGITS
    forth
    0.92
     employed
    0.74
    been
    0.74
     considered
    0.73
     confined
    0.71
     entertained
    0.70
     immortal
    0.70
    ãĥĩãĤ£
    0.68
    agree
    0.68
    always
    0.67
    Act Density 0.197%

    No Known Activations