INDEX
Explanations
person names
references to the name "Wang."
New Auto-Interp
Negative Logits
phia
-0.83
*/(
-0.76
pent
-0.73
phis
-0.71
byter
-0.68
ctic
-0.66
odic
-0.66
tarians
-0.65
Camden
-0.64
TAIN
-0.63
POSITIVE LOGITS
Bei
1.04
Wang
1.02
Chao
1.02
zhou
0.99
wana
0.98
Jian
0.97
jing
0.96
Fei
0.95
Xia
0.94
Peng
0.94
Activations Density 0.009%