INDEX
Explanations
elements of protest, critique, or strong expressions of dissent
New Auto-Interp
Negative Logits
pite
-0.15
strand
-0.14
iteral
-0.14
owo
-0.13
_closure
-0.13
uyor
-0.13
olley
-0.13
hey
-0.13
quences
-0.13
ıc
-0.13
POSITIVE LOGITS
/Open
0.17
decl
0.14
!
0.14
linky
0.14
obel
0.14
raft
0.13
ether
0.13
ise
0.13
ugen
0.13
plank
0.12
Activations Density 0.198%