INDEX
Explanations
the presence of the word "only."
New Auto-Interp
Head Attr Weights
0:0.07
1:0.09
2:0.08
3:0.09
4:0.08
5:0.08
6:0.07
7:0.08
8:0.08
9:0.07
10:0.08
11:0.09
Negative Logits
GBT
-2.91
Women
-2.88
nai
-2.58
Parties
-2.57
feminism
-2.52
天
-2.52
YL
-2.43
Reform
-2.36
Breaking
-2.35
orses
-2.35
POSITIVE LOGITS
olis
2.85
perf
2.78
Takeru
2.77
resso
2.72
Temp
2.52
passer
2.52
stub
2.50
scans
2.47
tasted
2.38
fingerprints
2.35
Activations Density 0.000%