INDEX
Explanations
phrases that urge people to take action or perform a good deed
New Auto-Interp
Negative Logits
ulous
-0.16
837
-0.15
nx
-0.15
onica
-0.15
ATS
-0.14
heaven
-0.14
Heaven
-0.14
finity
-0.14
phant
-0.14
aspir
-0.14
POSITIVE LOGITS
errupt
0.16
BOT
0.15
elda
0.14
assin
0.14
æ¦ľ
0.14
oppel
0.14
udder
0.14
ereg
0.14
elez
0.14
davran
0.13
Activations Density 0.209%