INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Invention
-0.71
Franch
-0.65
Horses
-0.64
Ard
-0.63
Scenes
-0.63
Shine
-0.61
Uber
-0.61
Speed
-0.60
namely
-0.59
Canaan
-0.58
POSITIVE LOGITS
illes
0.78
agascar
0.76
Wan
0.74
ushima
0.71
icans
0.70
isf
0.69
oma
0.69
ogl
0.69
indebted
0.68
odon
0.68
Activations Density 0.000%
No Known Activations
This feature has no known activations.