INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
captive
-0.70
jar
-0.67
throat
-0.66
ple
-0.65
atio
-0.64
mort
-0.62
disple
-0.62
ching
-0.62
quartered
-0.61
heir
-0.60
POSITIVE LOGITS
uggest
0.74
IVERS
0.68
rences
0.68
glomer
0.66
allas
0.65
baum
0.65
antid
0.63
Wildcats
0.63
olkien
0.63
yrinth
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.