INDEX
Explanations
names of political figures
references to various political leaders and government officials
New Auto-Interp
Negative Logits
ãĥ´ãĤ¡
-0.74
ãĤ½
-0.73
Reviewer
-0.73
à¨
-0.70
++++++++++++++++
-0.69
Ire
-0.68
ãĥĢ
-0.65
Tu
-0.63
docs
-0.62
natureconservancy
-0.62
POSITIVE LOGITS
Jr
0.94
bey
0.85
steen
0.83
oversaw
0.80
enei
0.79
ovich
0.78
III
0.76
joked
0.74
toured
0.74
hler
0.72
Activations Density 0.165%