INDEX
Explanations
instances of the word "some" followed by positive descriptions or actions
New Auto-Interp
Negative Logits
istan
-0.95
gon
-0.95
raid
-0.94
SPONSORED
-0.92
ipper
-0.92
gang
-0.90
peed
-0.88
ocene
-0.88
acus
-0.86
agents
-0.84
POSITIVE LOGITS
place
1.35
ones
1.24
semblance
1.15
serious
1.13
body
1.12
consolation
1.07
additional
1.07
how
1.06
decent
1.02
sort
1.01
Activations Density 0.835%