INDEX
Explanations
references to family relationships and interactions
New Auto-Interp
Negative Logits
nên
-0.17
proven
-0.16
oli
-0.15
APA
-0.14
ernes
-0.14
tweeting
-0.14
oku
-0.14
proved
-0.13
alley
-0.13
astes
-0.13
POSITIVE LOGITS
loved
0.26
Loved
0.26
used
0.22
would
0.21
liked
0.20
_used
0.20
always
0.20
often
0.20
used
0.19
liked
0.19
Activations Density 0.159%