INDEX
Explanations
phrases indicating persuasion or attempts to convince others
New Auto-Interp
Negative Logits
alue
-0.17
arkan
-0.15
.nih
-0.15
çľģ
-0.14
太éĥİ
-0.14
ì¹ľ
-0.14
occasion
-0.14
.readValue
-0.14
roj
-0.14
úp
-0.14
POSITIVE LOGITS
exc
0.16
432
0.15
å
0.15
drives
0.15
elsen
0.14
ought
0.14
poil
0.14
离
0.14
itzer
0.14
licensors
0.14
Activations Density 0.019%