INDEX
Explanations
references to privileges
New Auto-Interp
Negative Logits
enfans
-0.42
psicología
-0.41
omge
-0.41
coches
-0.40
policías
-0.39
ciudadana
-0.39
fronteras
-0.39
Naciones
-0.38
bomberos
-0.38
kepercayaan
-0.38
POSITIVE LOGITS
resourceCulture
1.02
privileges
1.02
Privile
1.01
queſta
1.00
privile
1.00
<unused3>
1.00
[@BOS@]
1.00
<unused28>
0.99
<unused14>
0.99
<unused8>
0.99
Activations Density 0.188%