INDEX
Explanations
phrases that express general awareness or common knowledge
New Auto-Interp
Negative Logits
ลาย
-0.15
ë§ī
-0.15
Desk
-0.15
ãĥ¯
-0.14
entrant
-0.14
/tiny
-0.14
wick
-0.14
etting
-0.14
racak
-0.13
untime
-0.13
POSITIVE LOGITS
know
0.34
knows
0.30
known
0.30
known
0.28
Know
0.27
know
0.25
çŁ¥
0.25
-known
0.25
çŁ¥
0.24
Know
0.24
Activations Density 0.129%