INDEX
Explanations
words indicating necessity or obligation
New Auto-Interp
Negative Logits
omain
-0.17
rift
-0.16
licative
-0.15
adele
-0.15
room
-0.15
ÑĢей
-0.15
zim
-0.14
restricted
-0.14
rooms
-0.14
евиÑĩ
-0.14
POSITIVE LOGITS
izia
0.16
æĬľ
0.15
velle
0.14
dou
0.14
enan
0.14
dou
0.14
ãĥģãĥ¥
0.14
uilt
0.14
길
0.14
Ø·ØŃ
0.14
Activations Density 0.000%