INDEX
Explanations
phrases that denote granting permission or enabling actions
New Auto-Interp
Negative Logits
-0.50
a
-0.47
T
-0.44
↵↵
-0.44
and
-0.43
the
-0.43
↵
-0.42
pattern
-0.42
A
-0.42
high
-0.42
POSITIVE LOGITS
ALLOWED
0.87
Allows
0.85
allows
0.85
organise
0.85
allowing
0.83
Allows
0.81
berdayakan
0.80
letting
0.80
desmotivaciones
0.80
Allowing
0.80
Activations Density 0.317%