INDEX
Explanations
instances of character names and their attributes
New Auto-Interp
Negative Logits
McD
-0.27
ãĥ³ãĥī
-0.24
bd
-0.23
ãĥ«ãĥī
-0.22
bd
-0.21
BD
-0.21
GD
-0.21
ãĥī
-0.21
Bd
-0.20
ourd
-0.20
POSITIVE LOGITS
ide
0.38
ided
0.34
ides
0.32
IDE
0.30
idea
0.29
ideas
0.29
Ide
0.28
side
0.26
ide
0.26
idebar
0.25
Activations Density 0.074%