INDEX
Explanations
concepts related to hypocrisy and inconsistent beliefs
New Auto-Interp
Negative Logits
ÙĦات
-0.15
Bien
-0.15
blas
-0.15
éric
-0.14
iek
-0.14
uid
-0.14
iu
-0.14
cyn
-0.14
iment
-0.13
ester
-0.13
POSITIVE LOGITS
straw
0.19
defenses
0.18
Straw
0.17
defenders
0.17
skyt
0.16
attacks
0.16
oje
0.16
wap
0.15
defend
0.15
pole
0.15
Activations Density 0.840%