INDEX
Explanations
The neuron detects mathematical equations and expressions
New Auto-Interp
Negative Logits
磪
0.62
闻
0.61
passionately
0.61
disapproval
0.60
সাধারণত
0.58
oughby
0.57
থায়
0.57
isn
0.56
blir
0.56
聞
0.56
POSITIVE LOGITS
two
1.09
two
1.02
TWO
1.02
three
1.00
three
0.98
zwei
0.98
兩個
0.98
dwóch
0.98
four
0.96
THREE
0.96
Activations Density 0.404%