INDEX
Explanations
phrases indicating uncertainty and limited knowledge about a subject
New Auto-Interp
Negative Logits
ãĥªãĤ¢
-0.16
olle
-0.16
lee
-0.14
LEE
-0.14
æľĭ
-0.13
Shields
-0.13
aurus
-0.13
hol
-0.13
roit
-0.13
.shapes
-0.13
POSITIVE LOGITS
known
0.34
knowledge
0.32
know
0.29
known
0.28
knowledge
0.28
Known
0.27
-known
0.27
Knowledge
0.26
Knowledge
0.25
Known
0.25
Activations Density 0.261%