INDEX
Explanations
phrases related to permission and restrictions
New Auto-Interp
Negative Logits
cuál
-0.51
problemet
-0.48
toiminta
-0.47
Hva
-0.46
rencont
-0.45
cuáles
-0.44
inversores
-0.44
rencontré
-0.43
Moscú
-0.43
rencontrer
-0.42
POSITIVE LOGITS
Allowed
1.00
allowed
0.93
allowed
0.92
ALLOWED
0.92
Allowed
0.81
ALLOWED
0.78
permitted
0.76
permitted
0.75
extAlignment
0.73
Forbidden
0.71
Activations Density 0.015%