INDEX
Explanations
phrases related to temporal aspects and relationships in discussions
New Auto-Interp
Head Attr Weights
0:0.04
1:0.03
2:0.07
3:0.05
4:0.06
5:0.03
6:0.42
7:0.06
8:0.04
9:0.03
10:0.07
11:0.05
Negative Logits
️
-1.28
inav
-1.23
sshd
-1.22
itsch
-1.22
bleach
-1.18
upstream
-1.17
lessly
-1.15
Diesel
-1.14
jab
-1.14
bud
-1.13
POSITIVE LOGITS
urai
1.49
�
1.35
enment
1.35
ndra
1.32
alian
1.32
ylum
1.29
�
1.27
Poverty
1.24
�
1.24
alia
1.22
Activations Density 0.003%