INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
consequential
-0.75
olin
-0.70
POSE
-0.69
istani
-0.65
FORE
-0.64
overrun
-0.62
inhal
-0.62
fetch
-0.61
prem
-0.61
conditional
-0.60
POSITIVE LOGITS
ãĥķãĤ¡
0.74
Blocks
0.68
Badge
0.66
stals
0.65
Clever
0.64
illy
0.62
Truth
0.61
anooga
0.61
ounces
0.61
Fair
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.