INDEX
Negative Logits
PBS
-0.08
ovar
-0.08
Loan
-0.08
housed
-0.08
remodel
-0.07
aturing
-0.07
Histogram
-0.07
Rental
-0.07
Campus
-0.07
Fal
-0.07
POSITIVE LOGITS
GPT
0.09
sinful
0.08
lli
0.08
<|end|>
0.08
vivid
0.08
unethical
0.08
?↵↵↵↵
0.08
wrongdoing
0.08
GPT
0.08
THC
0.08
Activations Density 0.047%