INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ulner
-0.65
fog
-0.63
sidx
-0.62
decomp
-0.62
Integ
-0.60
eclipse
-0.60
numb
-0.60
rust
-0.59
Airbnb
-0.59
collectively
-0.59
POSITIVE LOGITS
kered
0.94
olic
0.89
awed
0.76
shaw
0.74
iliary
0.70
ocrats
0.69
reditary
0.69
elfare
0.69
het
0.68
zag
0.68
Activations Density 0.000%
No Known Activations
This feature has no known activations.