INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
neighb
-0.74
creditor
-0.65
lamp
-0.65
centres
-0.64
centers
-0.64
unpop
-0.64
annex
-0.63
spinning
-0.63
newsp
-0.62
lou
-0.61
POSITIVE LOGITS
umat
0.88
ensen
0.81
oen
0.79
ciples
0.76
enic
0.76
ahon
0.75
ute
0.75
heed
0.75
ihad
0.75
erest
0.73
Activations Density 0.000%
No Known Activations
This feature has no known activations.