INDEX
Explanations
negative or restrictive expressions in the text
New Auto-Interp
Negative Logits
OVE
-0.17
ắn
-0.15
алÑĥ
-0.15
awns
-0.14
opers
-0.14
boy
-0.14
ubu
-0.13
ngr
-0.13
hereby
-0.13
itra
-0.13
POSITIVE LOGITS
mind
0.29
dream
0.28
mind
0.24
Mind
0.24
Dream
0.23
Mind
0.23
Dream
0.21
梦
0.21
necessarily
0.21
wouldn
0.20
Activations Density 0.086%