INDEX
Negative Logits
screened
-0.08
queries
-0.08
pretend
-0.08
ogl
-0.07
.Ap
-0.07
Tao
-0.07
/sc
-0.07
Boek
-0.07
constitu
-0.07
讯
-0.07
POSITIVE LOGITS
geraten
0.09
fireworks
0.08
uncont
0.08
uncontrolled
0.08
abandono
0.08
arly
0.08
helpless
0.07
ganas
0.07
fent
0.07
589
0.07
Activations Density 0.018%