INDEX
Explanations
mentions of specific places or entities associated with actions or outcomes
New Auto-Interp
Head Attr Weights
0:0.05
1:0.02
2:0.14
3:0.06
4:0.14
5:0.07
6:0.02
7:0.02
8:0.23
9:0.12
10:0.04
11:0.03
Negative Logits
wordpress
-1.31
️
-1.17
Gujar
-1.15
osexual
-1.14
llo
-1.14
advoc
-1.07
shaved
-1.07
��
-1.04
dump
-1.03
aiman
-1.02
POSITIVE LOGITS
ordes
1.45
aez
1.28
entials
1.25
phia
1.19
ioxide
1.19
Pieces
1.14
ensibly
1.10
enhagen
1.04
Newton
1.04
Territory
1.03
Activations Density 0.005%