INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
derog
-0.73
downs
-0.67
hler
-0.67
audio
-0.65
gae
-0.64
--------------------------------------------------------
-0.64
quotas
-0.63
retaliation
-0.62
friends
-0.62
itism
-0.61
POSITIVE LOGITS
ancock
0.81
Gutenberg
0.73
Surviv
0.72
Pradesh
0.71
ogly
0.70
ospace
0.70
Awareness
0.69
phis
0.67
profession
0.65
Technician
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.