INDEX
Negative Logits
perjury
-0.44
Papers
-0.43
cember
-0.43
Dialog
-0.41
ONSORED
-0.41
Sakuya
-0.41
Period
-0.41
Marcos
-0.41
Seaf
-0.40
filing
-0.40
POSITIVE LOGITS
organis
0.48
isse
0.47
zees
0.47
rush
0.46
hift
0.46
iets
0.46
rha
0.46
ravel
0.46
spons
0.45
kin
0.45
Activations Density 0.023%