INDEX
Explanations
eligibility
The neuron detects the word “eligibility” (and its variants) in text.
New Auto-Interp
Negative Logits
Attacks
-0.08
็อก
-0.08
atan
-0.07
आक
-0.07
attacks
-0.07
216
-0.07
Recorded
-0.06
文
-0.06
khắc
-0.06
intents
-0.06
POSITIVE LOGITS
eligible
0.16
eligible
0.12
ineligible
0.12
Elig
0.11
eligibility
0.11
elig
0.09
elites
0.09
El
0.08
exhilar
0.07
GE
0.07
Activations Density 0.003%