INDEX
Explanations
instances of subjective descriptions and evaluative statements
New Auto-Interp
Negative Logits
TestingModule
-0.17
dou
-0.15
vise
-0.15
preamble
-0.14
iets
-0.14
Kendrick
-0.14
spm
-0.14
ongan
-0.14
stå
-0.14
stance
-0.13
POSITIVE LOGITS
idon
0.15
adesh
0.15
inson
0.15
.strict
0.15
iji
0.15
ÃĹ↵↵
0.14
umb
0.14
á»±c
0.14
»
0.14
amel
0.13
Activations Density 0.100%