INDEX
Explanations
countries, political figures, and government-related terms
proper nouns related to geopolitical issues and countries
New Auto-Interp
Negative Logits
mble
-0.62
Niet
-0.61
indu
-0.52
Hiroshima
-0.50
yip
-0.49
RW
-0.49
pires
-0.47
}"
-0.46
veyard
-0.46
tumblr
-0.46
POSITIVE LOGITS
's
0.95
coffers
0.68
ÃŃs
0.65
because
0.59
amid
0.58
whereby
0.57
Care
0.56
throughout
0.56
through
0.54
lately
0.54
Activations Density 0.471%