INDEX
Explanations
names of specific individuals
instances of the word "lie" and its variations
New Auto-Interp
Negative Logits
idental
-0.83
okers
-0.80
kefeller
-0.76
isco
-0.76
ocative
-0.74
ugal
-0.71
izoph
-0.70
ional
-0.70
iance
-0.70
ively
-0.69
POSITIVE LOGITS
gie
1.10
utenant
1.03
lla
0.92
orie
0.91
llo
0.90
âĸ¬
0.85
tto
0.80
ga
0.80
ffe
0.80
lette
0.79
Activations Density 0.019%