INDEX
Explanations
terms related to acceptance and rejection
New Auto-Interp
Negative Logits
ervoor
-0.50
electricity
-0.46
affairs
-0.46
שוליים
-0.46
the
-0.46
about
-0.45
DIRECTION
-0.43
propertyName
-0.43
query
-0.42
OPERATION
-0.42
POSITIVE LOGITS
accept
1.21
accepts
1.15
Accept
1.13
accepted
1.09
Accept
1.09
Accepted
1.09
ACCEPT
1.06
accepted
1.05
Accepting
1.04
Accepted
1.02
Activations Density 0.234%