INDEX
Explanations
references to influential or notable individuals, particularly in political contexts
attends to the name "John" from the Twitter handle or last name of the person named John.
New Auto-Interp
Head Attr Weights
0:0.05
1:0.02
2:0.05
3:0.04
4:0.06
5:0.03
6:0.13
7:0.05
8:0.06
9:0.42
10:0.02
11:0.03
Negative Logits
Sty
-3.50
Planes
-3.48
ATF
-3.42
Spike
-3.31
Tes
-3.23
●
-3.17
aza
-3.04
Tes
-3.03
Shap
-3.00
Tol
-3.00
POSITIVE LOGITS
John
7.06
john
6.95
John
6.91
JOHN
6.32
JOHN
6.14
Johnston
5.89
john
5.88
Johns
5.70
Johnson
4.19
Johnson
4.19
Activations Density 0.037%