INDEX
Explanations
specific character strings, possibly looking for non-standard or special characters in the text
New Auto-Interp
Negative Logits
ason
-0.16
sak
-0.15
lá»ĩ
-0.15
_cre
-0.15
šil
-0.14
uegos
-0.14
589
-0.14
ayload
-0.14
нада
-0.14
avirus
-0.14
POSITIVE LOGITS
single
0.20
responsibility
0.19
Single
0.17
SINGLE
0.17
responsibilities
0.17
tester
0.16
tests
0.16
live
0.16
low
0.16
single
0.16
Activations Density 0.008%