INDEX
Explanations
pronouns indicating personal relationships or interactions
New Auto-Interp
Head Attr Weights
0:0.07
1:0.04
2:0.10
3:0.08
4:0.07
5:0.13
6:0.04
7:0.05
8:0.20
9:0.06
10:0.05
11:0.05
Negative Logits
ancest
-1.75
alach
-1.75
challeng
-1.71
pta
-1.61
atown
-1.60
aine
-1.56
hedon
-1.53
odan
-1.52
agine
-1.51
puter
-1.51
POSITIVE LOGITS
invoke
1.53
ía
1.50
aeus
1.47
ERY
1.45
disconnected
1.43
ICES
1.43
CentOS
1.42
REC
1.42
toggle
1.41
VICE
1.41
Activations Density 0.000%