INDEX
Explanations
phrases reflecting personal opinions or beliefs
New Auto-Interp
Negative Logits
Ñĸдно
-0.16
dre
-0.15
stroy
-0.13
ÑĪов
-0.13
ibrary
-0.13
ãĥ£
-0.13
ells
-0.12
iko
-0.12
boyc
-0.12
поÑĢ
-0.12
POSITIVE LOGITS
talking
0.69
referring
0.55
talk
0.54
Talking
0.52
refer
0.49
Talking
0.49
speaking
0.48
-talk
0.44
refers
0.43
talks
0.42
Activations Density 0.201%