INDEX
Explanations
phrases indicating negotiation, exceptions, or conditions related to rules and theories
New Auto-Interp
Negative Logits
idth
-0.17
udder
-0.17
apid
-0.16
eker
-0.16
oren
-0.15
hsi
-0.14
oris
-0.14
iese
-0.14
ullan
-0.14
İ
-0.14
POSITIVE LOGITS
circum
0.20
otherwise
0.18
exceptions
0.17
forced
0.17
Otherwise
0.16
overcome
0.16
Schwe
0.15
overcoming
0.15
natal
0.15
ritch
0.15
Activations Density 0.020%