INDEX
Explanations
references to language and translation
New Auto-Interp
Negative Logits
ox
-0.16
Ireland
-0.15
339
-0.15
ض
-0.14
chner
-0.14
intra
-0.14
ei
-0.13
pron
-0.13
cstdint
-0.13
ov
-0.13
POSITIVE LOGITS
English
0.40
English
0.35
Spanish
0.32
Hebrew
0.31
languages
0.30
Arabic
0.30
english
0.29
èĭ±è¯Ń
0.29
French
0.28
english
0.28
Activations Density 0.164%