INDEX
Explanations
contradictions and discussions surrounding values and actions related to beliefs
New Auto-Interp
Negative Logits
lest
-0.15
аÑģÑģив
-0.14
éĽĦ
-0.14
dummy
-0.14
esters
-0.14
Comparable
-0.13
cono
-0.13
.Dependency
-0.13
indre
-0.13
cef
-0.13
POSITIVE LOGITS
contrary
0.51
CONTR
0.44
contrad
0.42
contr
0.42
Contr
0.41
Contr
0.40
conflict
0.36
contr
0.36
contradict
0.36
contrast
0.35
Activations Density 0.286%