INDEX
Explanations
concepts related to theoretical frameworks or models
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.07
3:0.09
4:0.02
5:0.04
6:0.13
7:0.21
8:0.05
9:0.05
10:0.06
11:0.16
Negative Logits
spons
-1.52
Ukrain
-1.21
appropriately
-1.18
◼
-1.18
pause
-1.16
wagen
-1.10
Roundup
-1.09
mods
-1.07
respect
-1.06
vr
-1.05
POSITIVE LOGITS
Obj
1.18
meaning
1.14
ographical
1.11
plain
1.04
ography
1.02
phrases
1.01
rencies
1.00
�
0.99
asketball
0.99
constructs
0.98
Activations Density 0.003%