INDEX
Explanations
instances of the word "no" associated with refusals or negative responses
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.09
3:0.05
4:0.13
5:0.03
6:0.07
7:0.34
8:0.04
9:0.03
10:0.08
11:0.04
Negative Logits
largeDownload
-1.73
lance
-1.64
Spread
-1.63
quartered
-1.57
Winged
-1.57
ゼウス
-1.55
andise
-1.52
Annotations
-1.51
aunder
-1.49
Appearance
-1.49
POSITIVE LOGITS
blockers
1.54
submissions
1.53
rejected
1.45
proposals
1.44
olean
1.44
affirmative
1.43
objection
1.43
deserving
1.41
rejection
1.41
gging
1.40
Activations Density 0.006%