INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
fect
-0.72
die
-0.68
sans
-0.64
Fidel
-0.63
perish
-0.60
dissenting
-0.60
ives
-0.60
Constantine
-0.59
ynski
-0.59
åħī
-0.59
POSITIVE LOGITS
ihara
0.77
ffield
0.72
days
0.70
ilk
0.67
videos
0.67
iltr
0.66
borgh
0.66
ikini
0.65
ourney
0.65
oking
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.