INDEX
Explanations
references to exceptions in rules or norms
New Auto-Interp
Negative Logits
shake
-0.15
manship
-0.15
iaux
-0.15
slick
-0.14
agra
-0.14
isha
-0.14
iske
-0.14
lek
-0.13
ona
-0.13
LETE
-0.13
POSITIVE LOGITS
ìĤ¬íķŃ
0.20
ively
0.17
Schwar
0.15
ually
0.15
aldi
0.15
ities
0.15
ìĤ¬íķŃ
0.15
éĻħ
0.14
swith
0.14
ably
0.14
Activations Density 0.026%