INDEX
Explanations
mentions of various countries and political figures
references to historical figures, colors, and various social and fantasy themes
New Auto-Interp
Negative Logits
76561
-0.70
acknowledgement
-0.65
quartered
-0.62
Bris
-0.60
eatures
-0.60
GoldMagikarp
-0.59
etheless
-0.58
Authorization
-0.57
diplomatic
-0.56
Verb
-0.55
POSITIVE LOGITS
etc
1.33
respectively
1.08
or
0.94
!/
0.83
/$
0.80
etc
0.79
versa
0.78
blah
0.78
versus
0.77
*.
0.76
Activations Density 0.402%