INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
angel
-0.76
Ambro
-0.74
>>\
-0.73
elev
-0.72
Vaugh
-0.71
ngth
-0.69
ð
-0.69
esson
-0.66
href
-0.66
ills
-0.66
POSITIVE LOGITS
runaway
0.70
anarchist
0.69
spotted
0.68
ooters
0.64
predicting
0.64
autonomy
0.61
flee
0.61
pals
0.61
shy
0.60
anarchists
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.