INDEX
Explanations
instances of characters engaging in social interactions or exchanges
New Auto-Interp
Negative Logits
untime
-0.18
озв
-0.16
apore
-0.15
antz
-0.14
umper
-0.14
ONUS
-0.14
Ñģли
-0.14
ίζ
-0.13
ини
-0.13
thon
-0.13
POSITIVE LOGITS
knowing
0.23
resigned
0.23
satisfied
0.21
quick
0.20
practiced
0.20
look
0.20
sheep
0.19
concerned
0.19
defeated
0.19
firm
0.19
Activations Density 0.152%