INDEX
Explanations
the name "Alex" with varying levels of importance or context
New Auto-Interp
Negative Logits
otherwise
-0.73
recy
-0.66
prevail
-0.59
combust
-0.58
germ
-0.58
finer
-0.57
itiveness
-0.57
initiation
-0.57
altogether
-0.56
liness
-0.56
POSITIVE LOGITS
ei
1.23
andra
1.15
andre
1.11
opoulos
1.05
ey
1.03
iev
0.95
andr
0.95
sand
0.94
ildo
0.92
iov
0.86
Activations Density 0.026%