INDEX
Explanations
criticisms of hypocrisy and its effects on societal values
New Auto-Interp
Negative Logits
undry
-0.17
yw
-0.16
itsu
-0.15
posite
-0.15
ThanOrEqualTo
-0.15
)application
-0.15
TOTYPE
-0.15
Ỽ
-0.14
ãĥ©ãĤ¯
-0.14
unist
-0.14
POSITIVE LOGITS
!
0.17
Or
0.16
hence
0.15
ware
0.15
iod
0.15
they
0.14
_
0.14
consequence
0.14
iti
0.14
And
0.14
Activations Density 0.200%