INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ienne
-0.68
notor
-0.67
awar
-0.65
improvised
-0.65
usable
-0.64
estone
-0.64
mercenaries
-0.63
rog
-0.62
scaven
-0.62
granite
-0.60
POSITIVE LOGITS
za
0.85
kef
0.84
tera
0.79
zza
0.75
ocaly
0.75
heit
0.74
ENC
0.72
cens
0.71
zers
0.68
Kendrick
0.67
Activations Density 0.000%
No Known Activations
This feature has no known activations.