INDEX
Explanations
phrases indicating belonging or possession
New Auto-Interp
Negative Logits
ows
-0.16
ots
-0.14
bero
-0.14
emas
-0.14
iment
-0.14
ÙİÙĬ
-0.14
HeaderCode
-0.13
ÏĦοι
-0.13
ves
-0.13
ade
-0.13
POSITIVE LOGITS
ulton
0.18
ters
0.17
ahn
0.16
iesen
0.15
wards
0.15
ubre
0.14
umblr
0.14
Argb
0.14
gie
0.14
aleigh
0.14
Activations Density 0.048%