INDEX
Explanations
names of people, particularly "Goldberg."
mentions of specific names and political correctness
New Auto-Interp
Negative Logits
hur
-0.73
ampa
-0.65
ewater
-0.63
emer
-0.62
Ey
-0.60
TIT
-0.60
arb
-0.60
Mare
-0.60
mir
-0.59
hd
-0.59
POSITIVE LOGITS
Goldberg
3.02
correctness
2.09
Gupta
1.18
Jinping
1.16
sq
0.97
aque
0.91
instein
0.88
Enix
0.87
suprem
0.82
Ellison
0.81
Activations Density 0.028%