INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Calder
-0.70
acas
-0.69
urat
-0.63
antam
-0.62
Compan
-0.61
Vegas
-0.58
Dancing
-0.57
Kindle
-0.56
Tucson
-0.56
teenth
-0.55
POSITIVE LOGITS
we
1.47
_>
0.73
feld
0.71
uke
0.68
azer
0.66
pan
0.66
fl
0.65
rog
0.65
rex
0.65
omsky
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.