INDEX
Negative Logits
ulative
-0.73
vae
-0.70
oche
-0.68
ulz
-0.67
arer
-0.66
itched
-0.64
famous
-0.64
ilde
-0.61
Revel
-0.61
tal
-0.61
POSITIVE LOGITS
permission
1.14
permissions
1.02
waivers
0.85
granted
0.83
slips
0.82
Reviewer
0.80
clearance
0.80
authorizing
0.79
eous
0.73
confir
0.73
Activations Density 0.024%