INDEX
Explanations
comparative adjectives
New Auto-Interp
Negative Logits
ainted
-0.80
advertising
-0.74
shire
-0.70
EP
-0.67
EVA
-0.66
este
-0.65
mberg
-0.65
hr
-0.65
ité
-0.64
Ward
-0.63
POSITIVE LOGITS
than
1.90
Than
1.65
than
1.61
versions
0.90
"$:/
0.86
iating
0.82
ado
0.73
Faster
0.73
behaved
0.72
versions
0.68
Activations Density 0.179%