INDEX
Explanations
names of cities
instances of the letter "w"
New Auto-Interp
Negative Logits
uate
-0.70
paraly
-0.66
mosqu
-0.66
distingu
-0.65
conscientious
-0.64
culp
-0.63
retri
-0.61
unpre
-0.61
puzz
-0.61
arial
-0.61
POSITIVE LOGITS
elcome
1.40
itness
1.38
atts
1.32
isdom
1.20
ashington
1.19
restling
1.19
atcher
1.18
izard
1.16
addle
1.12
atson
1.12
Activations Density 0.036%