INDEX
    Explanations

    phrases related to warnings and legal notices

    New Auto-Interp
    Negative Logits
    yna
    -0.16
    agas
    -0.16
    agna
    -0.16
    erk
    -0.14
    /forum
    -0.14
    _misc
    -0.14
    leston
    -0.13
    endor
    -0.13
     homer
    -0.13
    msp
    -0.13
    POSITIVE LOGITS
     warnings
    0.41
     warning
    0.40
     Warning
    0.36
     warned
    0.35
     Warn
    0.34
    warnings
    0.33
    Warning
    0.33
     warn
    0.32
    warn
    0.31
     WARN
    0.31
    Act Density 0.141%

    No Known Activations