INDEX
Explanations
references to prominent individuals and accounts on social media
New Auto-Interp
Head Attr Weights
0:0.10
1:0.24
2:0.06
3:0.06
4:0.04
5:0.15
6:0.02
7:0.04
8:0.09
9:0.06
10:0.03
11:0.05
Negative Logits
captcha
-1.84
pie
-1.79
idas
-1.66
NOR
-1.57
══
-1.55
=/
-1.52
=$
-1.51
Bound
-1.47
match
-1.44
trip
-1.44
POSITIVE LOGITS
Lub
1.58
neum
1.54
conclud
1.52
tml
1.50
Huss
1.50
Rear
1.47
testified
1.47
guiName
1.46
counsel
1.46
grave
1.42
Activations Density 0.002%