INDEX
Explanations
phrases that indicate the application or enforcement of rules or conditions
New Auto-Interp
Negative Logits
estra
-0.50
нена
-0.47
Weinberg
-0.46
ESTRA
-0.45
Thornton
-0.44
ghosts
-0.43
Monfieur
-0.43
Ghost
-0.42
otor
-0.42
hänen
-0.42
POSITIVE LOGITS
apply
1.58
applies
1.51
apply
1.48
Apply
1.45
Apply
1.41
Applies
1.32
APPLY
1.29
Applies
1.24
APPLY
1.24
applied
1.24
Activations Density 0.026%