INDEX
Explanations
references to political and social hierarchies
New Auto-Interp
Negative Logits
".
-0.75
+.
-0.74
'.
-0.69
$.
-0.68
".
-0.65
!".
-0.65
"!
-0.63
."
-0.60
";
-0.60
?".
-0.59
POSITIVE LOGITS
pires
0.72
allied
0.66
pired
0.63
others
0.63
countless
0.60
successors
0.59
assorted
0.58
cohorts
0.56
commentators
0.56
venerable
0.56
Activations Density 0.233%