INDEX
Explanations
references to the act of swallowing or consuming
New Auto-Interp
Negative Logits
ACC
-0.16
755
-0.16
872
-0.15
pose
-0.15
106
-0.14
zers
-0.14
andon
-0.14
Mort
-0.14
opot
-0.14
äl
-0.14
POSITIVE LOGITS
swallowing
0.16
swallowed
0.15
swallow
0.15
orney
0.15
-abs
0.14
ovÃŃ
0.14
-urlencoded
0.14
ãģĻãģĻ
0.14
URN
0.14
-pills
0.14
Activations Density 0.092%