INDEX
Explanations
phrases related to permissions and access restrictions
New Auto-Interp
Negative Logits
rör
-0.31
relative
-0.30
relying
-0.30
fml
-0.29
xml
-0.28
friv
-0.28
결
-0.28
Ueb
-0.28
ніципалі
-0.28
selected
-0.27
POSITIVE LOGITS
prohibited
0.95
forbid
0.93
prohibit
0.92
prohibitions
0.92
forbidden
0.91
prohibition
0.90
forbids
0.87
forbade
0.84
prohibits
0.84
禁止
0.83
Activations Density 0.083%