INDEX
Explanations
names of characters and references to family relationships
New Auto-Interp
Negative Logits
â̦↵
-0.25
â̦↵↵
-0.25
...↵↵
-0.24
..↵
-0.24
..↵↵
-0.21
...↵
-0.21
...,
-0.20
..."
-0.19
-0.19
ï¼ļ↵↵
-0.17
POSITIVE LOGITS
↵
0.50
↵ ↵
0.45
č↵
0.37
↵ ↵
0.36
↵↵↵
0.36
↵↵
0.33
↵
0.30
↵
0.27
č↵
0.24
č↵č↵
0.23
Activations Density 0.015%