INDEX
Explanations
phrases indicating control and manipulation
New Auto-Interp
Negative Logits
Cop
-0.17
kop
-0.15
opy
-0.15
ÑĥÑĢи
-0.15
ARAM
-0.14
Cop
-0.14
Freund
-0.14
Pink
-0.14
çı
-0.14
ensen
-0.13
POSITIVE LOGITS
Comments
0.17
Comment
0.17
.Comment
0.17
umas
0.16
komment
0.16
COMMENTS
0.15
commenting
0.15
fifo
0.15
comments
0.15
comment
0.15
Activations Density 0.017%