INDEX
Explanations
discussions about understanding and recognizing the distinction between reality and constructed concepts
New Auto-Interp
Negative Logits
ixer
-0.15
ÎŃαÏĤ
-0.15
ÙĪØ§Ø¨
-0.14
Ñĸдом
-0.14
osit
-0.14
Zahl
-0.14
intColor
-0.13
olin
-0.13
ILT
-0.13
زÙĬز
-0.13
POSITIVE LOGITS
Curtain
0.14
underst
0.14
408
0.14
ema
0.14
rieg
0.14
Poss
0.14
ansson
0.13
ØŃ
0.13
ä¸ĭ
0.13
çĩ
0.13
Activations Density 1.931%