INDEX
Explanations
mentions of social media handles or usernames
New Auto-Interp
Head Attr Weights
0:0.09
1:0.04
2:0.05
3:0.05
4:0.04
5:0.05
6:0.22
7:0.05
8:0.04
9:0.24
10:0.04
11:0.04
Negative Logits
Calvin
-3.95
Pir
-3.92
pir
-3.90
Pir
-3.75
Mus
-3.58
Lucius
-3.39
Boko
-3.30
pirates
-3.30
Mut
-3.29
------
-3.25
POSITIVE LOGITS
Kessler
8.92
essler
6.63
Shelter
4.10
Vest
3.81
Higgins
3.73
Kahn
3.40
DK
3.39
Sutherland
3.37
Garrison
3.37
alez
3.35
Activations Density 0.000%