INDEX
Explanations
the presence of specific capitalized names or entities
New Auto-Interp
Negative Logits
urs
-0.18
ιÏĩ
-0.16
arr
-0.15
asal
-0.15
uda
-0.15
ome
-0.15
038
-0.15
ادÙĬ
-0.15
monic
-0.15
ois
-0.15
POSITIVE LOGITS
aylor
0.20
ogue
0.19
ordin
0.18
eria
0.18
eri
0.18
ayaran
0.18
yst
0.17
ields
0.17
istor
0.17
urm
0.16
Activations Density 0.021%