INDEX
Explanations
names with historical significance or notable mentions
New Auto-Interp
Negative Logits
egan
-0.18
.wikipedia
-0.15
inh
-0.15
nite
-0.15
attern
-0.15
Germ
-0.14
odable
-0.14
.Scan
-0.14
adj
-0.14
Mattis
-0.14
POSITIVE LOGITS
šov
0.17
asher
0.17
fal
0.16
_GU
0.14
ainment
0.14
PID
0.14
scho
0.14
itness
0.14
ø
0.14
pot
0.14
Activations Density 0.000%