INDEX
Explanations
hex color codes preceded by a '#'
New Auto-Interp
Negative Logits
g
-0.78
H
-0.76
G
-0.73
w
-0.72
j
-0.72
r
-0.71
M
-0.70
X
-0.70
W
-0.68
V
-0.68
POSITIVE LOGITS
Monfieur
0.64
houſe
0.59
ViewFeatures
0.58
poffe
0.57
cauſe
0.55
himſelf
0.54
daad
0.54
chofe
0.54
CEED
0.52
pośred
0.52
Activations Density 1.719%