INDEX
Explanations
phrases indicating old-fashioned or traditional styles and practices
New Auto-Interp
Negative Logits
aper
-0.18
ende
-0.15
ICON
-0.15
abor
-0.14
rou
-0.14
æĵ
-0.14
Guard
-0.14
tec
-0.14
/icons
-0.14
mim
-0.13
POSITIVE LOGITS
Ãĸr
0.17
olith
0.15
artz
0.15
rang
0.15
olin
0.15
ilies
0.14
üstü
0.14
ãģĵãģĿ
0.14
clide
0.14
ewan
0.14
Activations Density 0.002%