INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
auri
-0.61
â
-0.61
scorer
-0.60
Liter
-0.60
sing
-0.59
thread
-0.58
Community
-0.57
astic
-0.57
Rational
-0.56
Metropolitan
-0.56
POSITIVE LOGITS
neighb
0.81
destro
0.73
eer
0.69
inia
0.69
tremend
0.68
orsi
0.68
gress
0.65
embr
0.65
Horowitz
0.64
warr
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.