INDEX
Negative Logits
osponsors
-0.90
acers
-0.72
none
-0.70
ummies
-0.69
orthy
-0.69
chairs
-0.68
KER
-0.68
acons
-0.66
ANS
-0.65
cs
-0.64
POSITIVE LOGITS
thing
1.36
heartedly
1.31
hearted
1.21
ordeal
1.19
affair
1.08
damn
0.98
idea
0.97
endeavor
0.91
debacle
0.90
thing
0.90
Activations Density 0.033%