INDEX
Explanations
phrases indicating understanding or familiarity with a subject
New Auto-Interp
Negative Logits
ingen
-0.16
unker
-0.15
andum
-0.15
ÑıÑĩ
-0.14
áš
-0.14
.bool
-0.14
ÙĤÙĬ
-0.13
Verification
-0.13
éħ
-0.13
lix
-0.13
POSITIVE LOGITS
understanding
0.50
know
0.50
knowledge
0.49
knows
0.48
understand
0.48
understands
0.48
çŁ¥éģĵ
0.42
knowing
0.42
knew
0.42
KNOW
0.41
Activations Density 0.425%