INDEX
Explanations
references to authenticity and manipulation in various contexts
New Auto-Interp
Negative Logits
quip
-0.17
inen
-0.16
loat
-0.14
decentral
-0.14
tang
-0.14
vis
-0.14
/bower
-0.13
à¤Ĥदर
-0.13
loc
-0.13
Shown
-0.13
POSITIVE LOGITS
artificial
0.38
Artificial
0.34
planned
0.30
artificially
0.29
intentional
0.29
deliberate
0.28
unnatural
0.26
carefully
0.25
synthetic
0.22
engineered
0.22
Activations Density 0.293%