INDEX
Explanations
The neuron appears to associate "especially" and "least" with contrast or emphasis in specific contexts
New Auto-Interp
Negative Logits
Най
1.32
d
1.25
특별시
1.19
써
1.19
lN
1.14
椹
1.12
데
1.11
ll
1.09
Kerala
1.09
i
1.09
POSITIVE LOGITS
на
1.90
ات
1.87
م
1.65
ियर
1.50
ים
1.44
্স
1.44
اته
1.43
вдруг
1.38
м
1.38
ţii
1.36
Activations Density 0.102%