INDEX
Explanations
proper nouns and terms related to conspiracy theories, such as "Kendricks" and "tin-foil-hat-wearing conspiracy nuts"
instances of counting or measurement-related expressions
New Auto-Interp
Negative Logits
abase
-0.73
thous
-0.67
@#&
-0.65
MpServer
-0.64
[];
-0.63
ultraviolet
-0.63
remem
-0.62
symp
-0.62
azeera
-0.62
undy
-0.61
POSITIVE LOGITS
bos
0.76
esty
0.74
los
0.67
bilt
0.65
liness
0.65
ocratic
0.64
operation
0.64
×Ļ×
0.63
oric
0.62
antics
0.61
Activations Density 0.000%