INDEX
Explanations
instances of hypocrisy and self-contradictory behavior in arguments
New Auto-Interp
Negative Logits
ikes
-0.17
çĸ
-0.14
erdale
-0.14
Tru
-0.14
aghan
-0.14
kå
-0.14
SETTINGS
-0.13
perg
-0.13
Elem
-0.13
pecting
-0.13
POSITIVE LOGITS
nun
0.15
ouro
0.15
éħ
0.15
Celt
0.14
adil
0.14
essel
0.14
obus
0.14
Vision
0.14
ause
0.14
aste
0.14
Activations Density 0.295%