INDEX
Explanations
phrases indicating requests or invitations for feedback or opinions
New Auto-Interp
Negative Logits
understanding
-0.27
Understanding
-0.24
understood
-0.23
Understanding
-0.22
rozum
-0.22
familiarity
-0.21
Awareness
-0.19
awareness
-0.19
Knowledge
-0.18
understand
-0.18
POSITIVE LOGITS
kn
0.27
know
0.21
kon
0.20
k
0.20
_k
0.18
km
0.18
Âłk
0.17
Know
0.16
No
0.16
.k
0.16
Activations Density 0.095%