INDEX
Explanations
instances of value judgments and evaluations
New Auto-Interp
Negative Logits
atak
-0.17
.appspot
-0.15
alace
-0.15
eiusmod
-0.15
.Undef
-0.14
bellion
-0.14
Damn
-0.14
uez
-0.14
htar
-0.14
.study
-0.14
POSITIVE LOGITS
option
0.22
added
0.22
following
0.22
possibility
0.20
prospect
0.20
distinct
0.18
suggestion
0.18
ability
0.17
novel
0.16
proposition
0.16
Activations Density 0.159%