INDEX
Explanations
phrases indicating opinion or intent
New Auto-Interp
Negative Logits
PerformLayout
-0.71
ylvan
-0.71
NonQuery
-0.65
LookAnd
-0.63
sprozess
-0.62
AFFIRMED
-0.61
Tasche
-0.60
Races
-0.60
Delayed
-0.60
っぱり
-0.59
POSITIVE LOGITS
means
1.42
mean
1.36
means
1.28
Means
1.20
Means
1.17
MEANS
1.04
Mean
0.98
mean
0.98
Mean
0.94
meant
0.94
Activations Density 0.109%