INDEX
Explanations
events, places, and names of specific people in news and sports
references to major events and figures in sports and politics
New Auto-Interp
Negative Logits
minist
-0.63
xual
-0.63
zsche
-0.59
ratom
-0.59
yss
-0.56
MpServer
-0.55
uscript
-0.54
glim
-0.51
outine
-0.51
stories
-0.49
POSITIVE LOGITS
.''.
0.61
.).
0.58
.''
0.56
fame
0.55
''.
0.54
respectively
0.51
*.
0.50
]."
0.50
).
0.48
."
0.48
Activations Density 1.782%