INDEX
Explanations
instances of the word "know" and its variations to identify conversational awareness and self-reference
New Auto-Interp
Negative Logits
oses
-0.18
oola
-0.17
ẫ
-0.16
egra
-0.15
gratuites
-0.15
field
-0.14
asje
-0.14
zÅij
-0.14
coe
-0.14
gett
-0.14
POSITIVE LOGITS
éĤ£ç§į
0.15
sometimes
0.15
ometimes
0.14
ÙĪØ§Ø±
0.14
üc
0.14
976
0.14
oming
0.14
ãģĵãģĿ
0.14
IED
0.14
ffset
0.13
Activations Density 0.023%