INDEX
Explanations
words and phrases related to assessments or judgments, often expressing strong opinions or definitions
expressions of subjective opinions or evaluations
New Auto-Interp
Negative Logits
cair
-0.66
anmar
-0.63
ieties
-0.63
Citiz
-0.62
ojure
-0.61
uador
-0.60
lvl
-0.60
vana
-0.59
arnaev
-0.59
kefeller
-0.59
POSITIVE LOGITS
,
0.74
unsu
0.73
unavailable
0.72
preferable
0.69
excluded
0.69
indistinguishable
0.68
identical
0.67
incompatible
0.66
superior
0.66
obliged
0.65
Activations Density 0.133%