INDEX
Explanations
instances of deception and manipulation, particularly in the context of technology and society
New Auto-Interp
Negative Logits
Cur
-0.17
ighth
-0.16
avr
-0.15
aus
-0.14
Cur
-0.14
avez
-0.14
phant
-0.14
Wilkinson
-0.14
Ment
-0.14
benef
-0.13
POSITIVE LOGITS
adol
0.15
unsus
0.15
BitConverter
0.15
urette
0.14
apo
0.14
641
0.14
/cms
0.14
lingen
0.14
ondon
0.14
ndl
0.14
Activations Density 0.328%