INDEX
Explanations
terms and phrases associated with access permissions and movie-related actions
New Auto-Interp
Negative Logits
O
-0.81
me
-0.79
op
-0.78
ph
-0.77
B
-0.74
o
-0.73
D
-0.73
v
-0.72
tr
-0.71
com
-0.71
POSITIVE LOGITS
myſelf
1.61
itſelf
1.55
raiſ
1.44
ſeveral
1.43
himſelf
1.43
Reſ
1.40
theſe
1.37
Efq
1.36
Diſ
1.35
juſt
1.35
Activations Density 0.046%