INDEX
Explanations
the presence of the substring "ly" in various forms
New Auto-Interp
Negative Logits
azzo
-0.17
adlo
-0.16
adio
-0.16
adu
-0.16
esan
-0.15
596
-0.15
ÅŁk
-0.15
icz
-0.15
azzi
-0.15
WN
-0.13
POSITIVE LOGITS
ạp
0.16
té
0.16
itar
0.15
thing
0.15
usercontent
0.14
chs
0.14
Äįas
0.14
MORE
0.14
honor
0.14
ège
0.14
Activations Density 0.023%