INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
promin
-0.18
outlier
-0.15
gran
-0.15
Abb
-0.14
televis
-0.14
UIScreen
-0.14
chw
-0.14
seperate
-0.14
seper
-0.14
eb
-0.14
POSITIVE LOGITS
Romanian
0.25
Äĥ
0.23
cea
0.22
Romania
0.22
pentru
0.19
ilor
0.18
ÈĻi
0.17
Roman
0.17
în
0.17
nesc
0.17
Activations Density 0.000%
No Known Activations
This feature has no known activations.