INDEX
Explanations
quantitative measurements and data
New Auto-Interp
Negative Logits
Dem
-0.17
esty
-0.15
udiant
-0.14
Fate
-0.14
nes
-0.14
IFO
-0.14
vail
-0.14
sene
-0.14
oyer
-0.14
ujet
-0.14
POSITIVE LOGITS
olit
0.15
ppers
0.14
integral
0.14
iman
0.14
unk
0.13
gens
0.13
ARIO
0.13
æĺ¨
0.13
Firearms
0.13
autiful
0.13
Activations Density 0.126%