INDEX
Explanations
phrases related to negative experiences or consequences
New Auto-Interp
Negative Logits
amusement
-0.76
characteristic
-0.73
convenience
-0.71
absence
-0.71
lifetime
-0.69
Tsukuyomi
-0.67
Sunshine
-0.66
constituent
-0.65
Identification
-0.64
proceeding
-0.64
POSITIVE LOGITS
raise
1.15
activate
1.14
handedly
1.14
heartedly
1.11
apply
1.10
create
1.09
react
1.06
fill
1.03
align
1.03
ply
1.02
Activations Density 0.042%