INDEX
Explanations
phrases related to warnings and legal notices
New Auto-Interp
Negative Logits
yna
-0.16
agas
-0.16
agna
-0.16
erk
-0.14
/forum
-0.14
_misc
-0.14
leston
-0.13
endor
-0.13
homer
-0.13
msp
-0.13
POSITIVE LOGITS
warnings
0.41
warning
0.40
Warning
0.36
warned
0.35
Warn
0.34
warnings
0.33
Warning
0.33
warn
0.32
warn
0.31
WARN
0.31
Activations Density 0.141%