INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
idth
-0.75
lan
-0.75
idi
-0.74
itate
-0.73
lis
-0.70
rosso
-0.70
Machina
-0.69
enegger
-0.69
tti
-0.68
opsis
-0.66
POSITIVE LOGITS
disg
0.80
senses
0.73
racket
0.71
jew
0.67
account
0.66
ommel
0.66
Ukrain
0.63
htt
0.61
transl
0.61
deal
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.