INDEX
Explanations
expressions of initial impressions or judgments
New Auto-Interp
Negative Logits
dam
-0.15
advertisement
-0.15
ryn
-0.14
Ø´ÙĨ
-0.14
c
-0.14
either
-0.14
owns
-0.14
stabilized
-0.14
suddenly
-0.14
zw
-0.14
POSITIVE LOGITS
initially
0.20
Initially
0.18
наÑĩ
0.18
Initially
0.17
/release
0.16
bé
0.15
nạn
0.14
.ns
0.14
>",
0.14
utherland
0.14
Activations Density 0.055%