INDEX
Explanations
references to historical figures and events
New Auto-Interp
Negative Logits
ed
-1.40
ever
-1.23
ised
-1.16
ities
-1.14
ized
-1.13
ists
-1.08
ist
-1.06
ette
-1.04
ef
-0.99
ize
-0.98
POSITIVE LOGITS
ations
0.56
ors
0.53
ory
0.52
aneous
0.50
ational
0.50
orial
0.49
arian
0.48
ation
0.47
ative
0.45
arians
0.45
Activations Density 7.243%