INDEX
    Explanations

    adjectives that express evaluation or judgment

    repetitive or placeholder content without specific descriptive elements

    New Auto-Interp
    Negative Logits
     Hanson
    -0.63
     Barrett
    -0.61
    Ó
    -0.61
    ILA
    -0.59
     Vaugh
    -0.59
     sanctioned
    -0.58
     Hert
    -0.58
     Levine
    -0.56
     Thornton
    -0.56
     Mobil
    -0.54
    POSITIVE LOGITS
    _
    0.75
     ][
    0.72
     !!
    0.68
     !
    0.67
     )]
    0.67
    enough
    0.65
     ];
    0.64
     enough
    0.64
    cookie
    0.64
     ]
    0.64
    Act Density 0.195%

    No Known Activations