INDEX
Explanations
references to horns or horned animals
New Auto-Interp
Negative Logits
gn
-0.19
hta
-0.16
gart
-0.16
genic
-0.15
gni
-0.15
teri
-0.14
tere
-0.14
zure
-0.14
caret
-0.14
spot
-0.14
POSITIVE LOGITS
beam
0.25
pipe
0.25
aday
0.23
ung
0.23
ed
0.23
ady
0.21
bill
0.21
et
0.21
itos
0.21
bl
0.20
Activations Density 0.021%