INDEX
Explanations
mentions of a person named "Wang"
references to a specific individual named Wang
New Auto-Interp
Negative Logits
phia
-0.89
TAIN
-0.75
tarian
-0.71
tarians
-0.69
vation
-0.66
byter
-0.65
judicial
-0.65
inally
-0.65
clair
-0.64
pent
-0.64
POSITIVE LOGITS
Chao
1.01
wana
0.98
jing
0.91
Bei
0.89
Jian
0.88
Xia
0.86
Fei
0.84
zhou
0.84
enegger
0.84
sung
0.83
Activations Density 0.019%