INDEX
Explanations
words related to specific types of cheese
New Auto-Interp
Negative Logits
ãĢľ
-0.20
"
-0.18
—"
-0.17
'
-0.17
&
-0.16
Âĵ
-0.16
ưu
-0.16
(~
-0.15
âĢħ
-0.15
("-0.15
POSITIVE LOGITS
--
0.41
--↵
0.35
>>
0.30
?>>
0.30
--↵↵
0.28
.--
0.28
(--
0.28
>>
0.27
--
0.25
>--
0.24
Activations Density 0.000%