INDEX
Explanations
comparisons or contrasts between different concepts
comparative phrases that indicate evaluation or judgment
New Auto-Interp
Negative Logits
ãĤ¼ãĤ¦ãĤ¹
-0.79
leen
-0.74
awar
-0.73
terness
-0.73
ulner
-0.68
moil
-0.68
unks
-0.67
pez
-0.67
onen
-0.67
iple
-0.66
POSITIVE LOGITS
actual
1.31
anything
1.23
genuine
1.15
substantive
1.11
conclusive
1.10
meaningful
1.00
outright
0.98
legitimate
0.98
definitive
0.96
serious
0.95
Activations Density 0.271%