INDEX
Explanations
pronouns and their usage in sentences
New Auto-Interp
Head Attr Weights
0:0.10
1:0.03
2:0.03
3:0.06
4:0.07
5:0.04
6:0.05
7:0.02
8:0.22
9:0.25
10:0.03
11:0.03
Negative Logits
Antar
-1.81
Rutherford
-1.74
rily
-1.74
Mars
-1.72
Hotel
-1.70
Schwar
-1.67
hotel
-1.67
anwhile
-1.65
Balloon
-1.63
Patron
-1.63
POSITIVE LOGITS
respect
2.04
quest
1.94
ribution
1.90
politics
1.85
izations
1.84
erest
1.84
considerations
1.83
ogie
1.82
phies
1.81
ado
1.78
Activations Density 0.001%