INDEX
Explanations
mentions of actions related to granting permissions or approvals
New Auto-Interp
Negative Logits
Maker
-0.71
Cups
-0.71
Citation
-0.70
ancial
-0.69
virtue
-0.66
Prosper
-0.61
FTWARE
-0.60
Nadu
-0.60
Solitaire
-0.60
Beir
-0.59
POSITIVE LOGITS
raham
1.10
usable
1.09
rog
1.07
omination
1.07
rid
1.03
ject
1.02
stract
0.98
dullah
0.97
duction
0.95
bey
0.95
Activations Density 0.024%