INDEX
Explanations
mentions of a specific place name
mentions of the name "Nam."
New Auto-Interp
Negative Logits
Engels
-0.71
understatement
-0.66
forecast
-0.64
Blackwell
-0.62
UID
-0.62
extrap
-0.60
angels
-0.60
devil
-0.59
inhib
-0.58
subparagraph
-0.57
POSITIVE LOGITS
nam
1.19
orously
1.01
ned
0.99
ovember
0.98
eless
0.98
essage
0.96
ilitary
0.95
forth
0.93
azing
0.93
emon
0.92
Activations Density 0.019%