INDEX
Explanations
mentions of specific names
proper nouns, particularly names of people or organizations
New Auto-Interp
Negative Logits
士
-0.82
hedral
-0.76
merce
-0.72
éĹĺ
-0.71
racuse
-0.69
Dispatch
-0.66
urus
-0.65
olina
-0.65
swick
-0.65
Crescent
-0.65
POSITIVE LOGITS
boycot
0.61
delay
0.61
beard
0.59
Buffett
0.59
Minion
0.57
boycott
0.56
ado
0.56
famine
0.56
Legendary
0.54
Lear
0.54
Activations Density 0.137%