INDEX
Explanations
proper nouns, specific names, and terms related to online culture and politics
New Auto-Interp
Negative Logits
Toll
-0.83
VP
-0.74
pity
-0.71
ulton
-0.70
holders
-0.70
¥µ
-0.70
captains
-0.69
folios
-0.68
============
-0.67
Footnote
-0.67
POSITIVE LOGITS
arro
1.27
agate
1.22
ipedia
1.18
azz
1.06
Buzz
1.04
arella
1.03
arre
1.02
atered
0.99
etta
0.96
eria
0.93
Activations Density 0.251%