INDEX
Explanations
proper names
mentions of specific individuals, particularly those with the last names Peters or Cousins
New Auto-Interp
Negative Logits
izen
-0.80
TPS
-0.75
ocking
-0.67
************
-0.67
icity
-0.66
iversity
-0.63
EMBER
-0.62
è¦ļéĨĴ
-0.61
VIDIA
-0.61
isse
-0.61
POSITIVE LOGITS
Cousins
0.95
ername
0.80
opher
0.80
gaard
0.79
kus
0.77
hus
0.77
workshop
0.76
chal
0.75
bats
0.74
ystem
0.72
Activations Density 0.020%