INDEX
Explanations
references to historical figures, specifically focusing on mentions of Theodore Roosevelt
mentions of the name "Roosevelt" and its related references
New Auto-Interp
Negative Logits
leon
-0.74
las
-0.72
semble
-0.71
LESS
-0.70
raints
-0.69
ateurs
-0.66
hem
-0.66
rav
-0.65
ning
-0.64
phabet
-0.62
POSITIVE LOGITS
Roosevelt
1.23
mosqu
0.92
enthal
0.92
velt
0.82
iets
0.79
hower
0.77
éĹĺ
0.76
enstein
0.75
dinand
0.72
udeau
0.71
Activations Density 0.011%