INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Dominion
-0.77
Robertson
-0.68
Horizon
-0.67
Samar
-0.66
ories
-0.64
Canaver
-0.60
Kot
-0.58
Kirk
-0.58
KNOWN
-0.58
Larson
-0.57
POSITIVE LOGITS
ele
0.83
fee
0.80
STATE
0.79
laws
0.74
law
0.74
weeds
0.72
Bloom
0.69
ois
0.69
law
0.68
state
0.67
Activations Density 0.000%
No Known Activations
This feature has no known activations.