INDEX
Explanations
references to philosophical concepts and contradictions
New Auto-Interp
Negative Logits
γκ
-0.16
ylon
-0.16
antasy
-0.15
kazy
-0.15
ãĥ«ãĥķ
-0.14
$GLOBALS
-0.14
chw
-0.14
plied
-0.14
uard
-0.14
_vlog
-0.13
POSITIVE LOGITS
judging
0.48
according
0.44
based
0.42
Based
0.36
according
0.36
based
0.34
jud
0.32
Based
0.31
Jud
0.30
Jud
0.30
Activations Density 0.092%