INDEX
Explanations
expressions of permission or requests to allow actions
New Auto-Interp
Negative Logits
familiarity
-0.22
understanding
-0.21
Knowledge
-0.20
Understanding
-0.19
Awareness
-0.19
awareness
-0.19
rozum
-0.18
aware
-0.18
knowledge
-0.18
understood
-0.17
POSITIVE LOGITS
kn
0.25
kon
0.17
k
0.17
km
0.16
itesse
0.15
now
0.15
Kn
0.15
Ñĥмов
0.14
_k
0.14
Bliss
0.14
Activations Density 0.086%