INDEX
Explanations
words related to human ancestors and prehistoric lifestyles
New Auto-Interp
Negative Logits
++++
-0.81
cia
-0.75
ioxide
-0.69
rait
-0.67
ysis
-0.66
ours
-0.66
ulhu
-0.66
theirs
-0.64
milo
-0.64
VL
-0.64
POSITIVE LOGITS
gat
0.84
Forest
0.68
Eight
0.68
learn
0.66
bart
0.66
auld
0.64
tale
0.64
Howard
0.64
poke
0.63
bender
0.62
Activations Density 0.133%