INDEX
Explanations
proper nouns of people's names
mentions of names, particularly focusing on the name "Roth"
New Auto-Interp
Negative Logits
tics
-0.69
Morsi
-0.66
PLA
-0.65
Demand
-0.64
Cortana
-0.63
Tube
-0.62
Dota
-0.62
Advertisement
-0.61
Tibet
-0.61
ATIONS
-0.58
POSITIVE LOGITS
bard
1.22
bart
1.11
stein
1.10
kj
0.99
enei
0.96
kefeller
0.95
schild
0.93
cliffe
0.93
child
0.87
ough
0.86
Activations Density 0.067%