INDEX
Explanations
references to deception or misrepresentation in various contexts
New Auto-Interp
Negative Logits
vrier
-0.16
THROW
-0.15
GroupBox
-0.15
ikk
-0.15
lias
-0.14
ç´Ļ
-0.14
peri
-0.14
æıIJåĩº
-0.14
wal
-0.13
loose
-0.13
POSITIVE LOGITS
facts
0.17
reports
0.17
/false
0.16
Jar
0.15
reporting
0.15
reports
0.15
realities
0.14
facts
0.14
about
0.14
ichert
0.14
Activations Density 0.097%