INDEX
Explanations
phrases related to legal permissions
New Auto-Interp
Negative Logits
ulative
-0.78
arer
-0.72
sonian
-0.71
nces
-0.70
oche
-0.69
vae
-0.66
famous
-0.63
enegger
-0.61
ilde
-0.61
Lans
-0.61
POSITIVE LOGITS
permission
1.11
permissions
0.91
Reviewer
0.86
slips
0.80
clearance
0.80
waivers
0.78
granted
0.77
ittee
0.77
ibl
0.76
consent
0.75
Activations Density 0.016%