INDEX
Explanations
phrases that indicate privacy protection and data handling
New Auto-Interp
Negative Logits
aptop
-0.16
issing
-0.16
lus
-0.16
almost
-0.14
uzey
-0.14
osite
-0.14
bastante
-0.14
innie
-0.14
аж
-0.14
sometimes
-0.14
POSITIVE LOGITS
nor
0.38
nor
0.33
EVER
0.25
Nor
0.24
Nor
0.23
NOR
0.22
ever
0.21
knowingly
0.19
ä¹Łä¸į
0.18
-ever
0.17
Activations Density 0.251%