INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ɚ
-0.64
AndEndTag
-0.62
Amerikaanse
-0.54
creș
-0.53
在美国
-0.53
argint
-0.52
Infórmanos
-0.51
Daß
-0.51
amerikanischen
-0.51
ității
-0.51
POSITIVE LOGITS
UK
1.12
Britain
1.05
£
1.03
British
1.02
(£
0.96
英国
0.96
BRITISH
0.94
Whilst
0.93
Whilst
0.93
-£
0.93
Activations Density 0.000%
No Known Activations
This feature has no known activations.