INDEX
Explanations
references to contrasting concepts, particularly related to good and bad
New Auto-Interp
Negative Logits
inson
-0.15
ãĤ«ãĥ«
-0.15
lep
-0.15
оло
-0.14
Lİ
-0.14
Depot
-0.14
218
-0.14
atorium
-0.14
ULONG
-0.14
iliar
-0.14
POSITIVE LOGITS
bad
0.49
bad
0.45
Bad
0.42
Bad
0.40
_bad
0.38
BAD
0.34
åĿı
0.32
.bad
0.31
BAD
0.29
evil
0.28
Activations Density 0.081%