INDEX
Explanations
expressions of gratitude and acknowledgment
New Auto-Interp
Negative Logits
â̦
-0.19
..
-0.17
â̦.
-0.16
Ìģ
-0.16
‘
-0.15
↵↵↵
-0.15
↵↵↵↵
-0.15
↵ ↵
-0.15
[â̦]
-0.15
Fucking
-0.15
POSITIVE LOGITS
(ph
0.27
sort
0.23
kind
0.23
-
0.22
quote
0.19
sort
0.19
-,
0.19
kind
0.18
-↵
0.17
quote
0.17
Activations Density 0.013%