INDEX
Negative Logits
nast
-0.07
safe
-0.06
buses
-0.06
pro
-0.06
penalty
-0.06
382
-0.06
.social
-0.06
cyn
-0.06
nob
-0.06
.tax
-0.06
POSITIVE LOGITS
profoundly
0.06
GetComponent
0.06
Researchers
0.06
.Logf
0.06
-dominated
0.06
.startswith
0.06
traveler
0.06
FE
0.06
angler
0.06
CEED
0.06
Activations Density 0.008%