INDEX
Negative Logits
�
-0.07
lying
-0.07
spep
-0.07
-0.07
Dys
-0.07
REAL
-0.07
,,
-0.07
drama
-0.06
,:),
-0.06
Description
-0.06
POSITIVE LOGITS
=========
0.08
throat
0.08
(thread
0.08
◠
0.07
Harden
0.07
atore
0.07
tures
0.07
attribute
0.07
rik
0.07
etrics
0.07
Activations Density 0.022%