INDEX
Explanations
proper nouns starting with "Su"
repeated instances of a specific name or term
New Auto-Interp
Negative Logits
juggling
-0.71
owl
-0.64
halves
-0.62
wright
-0.62
Blumenthal
-0.62
union
-0.60
AE
-0.58
Fired
-0.57
directions
-0.56
unions
-0.56
POSITIVE LOGITS
icides
1.38
icide
1.35
icidal
1.34
zanne
1.31
arez
1.20
itable
1.17
cc
1.09
pper
1.09
zy
1.06
ited
1.05
Activations Density 0.016%