INDEX
Explanations
words related to strong physical reactions and movements
terms related to compulsive behavior and its effects
New Auto-Interp
Negative Logits
bye
-0.63
hold
-0.59
Fellow
-0.59
Cage
-0.58
lihood
-0.57
Dominion
-0.57
Cald
-0.57
Amen
-0.57
Misty
-0.57
PCIe
-0.55
POSITIVE LOGITS
uls
1.16
atility
1.09
atile
1.07
untarily
0.94
ules
0.93
hip
0.92
atives
0.91
eni
0.90
erker
0.90
untary
0.89
Activations Density 0.008%