INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ategory
-0.77
raft
-0.75
rary
-0.70
uitous
-0.70
bole
-0.68
aft
-0.67
apes
-0.66
ãĤ¨ãĥ«
-0.66
nesday
-0.66
gradient
-0.66
POSITIVE LOGITS
Behavioral
0.70
zo
0.66
Shelter
0.64
DEN
0.64
FAA
0.64
Disabled
0.63
USAF
0.63
Anthrop
0.62
Investig
0.62
grounded
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.