INDEX
Explanations
negations or phrases that indicate a departure from expectations
New Auto-Interp
Negative Logits
ÙĥاÙĦ
-0.15
527
-0.15
ausge
-0.15
OrElse
-0.14
tenant
-0.14
inos
-0.13
854
-0.13
adultes
-0.13
ursal
-0.13
estion
-0.13
POSITIVE LOGITS
ordinary
0.23
typical
0.22
conventional
0.20
merely
0.19
mere
0.18
like
0.18
traditional
0.17
mere
0.17
another
0.16
your
0.16
Activations Density 0.086%