INDEX
Explanations
names of people and locations, especially related to historical events
New Auto-Interp
Negative Logits
somet
-0.22
Swordsman
-0.21
pole
-0.21
ulhu
-0.20
isEnabled
-0.20
inosaur
-0.20
scales
-0.20
epile
-0.19
ensor
-0.19
ritic
-0.19
POSITIVE LOGITS
lain
0.23
hua
0.23
gar
0.22
ira
0.22
aii
0.22
lla
0.22
edu
0.22
Blanc
0.21
gars
0.21
ously
0.21
Activations Density 14.666%