INDEX
Explanations
the presence of the name "Konstantin" or similar variations
New Auto-Interp
Negative Logits
dana
-0.18
eon
-0.17
eel
-0.17
ei
-0.15
eled
-0.15
ió
-0.15
ENCIES
-0.15
ebo
-0.15
eous
-0.15
érica
-0.14
POSITIVE LOGITS
rad
0.21
stant
0.19
ardy
0.19
igs
0.18
ÏĥÏĦαν
0.15
roe
0.15
penn
0.15
supern
0.15
zept
0.15
ter
0.15
Activations Density 0.010%