INDEX
Explanations
negations and contradictory statements
New Auto-Interp
Negative Logits
ente
-0.15
ilha
-0.15
sko
-0.14
gren
-0.14
gan
-0.14
nes
-0.14
venes
-0.14
avl
-0.13
alo
-0.13
.WinForms
-0.13
POSITIVE LOGITS
forth
0.18
lify
0.17
theless
0.17
umber
0.17
ìĦĿ
0.16
ĶåĽŀ
0.16
ptune
0.15
weg
0.15
soever
0.15
orage
0.14
Activations Density 0.023%