INDEX
Explanations
words related to causality and actions in past scenarios
references to the pronoun "it" and quantifiers in a variety of contexts
New Auto-Interp
Negative Logits
Making
-0.63
tern
-0.62
Hick
-0.60
Pants
-0.59
lash
-0.59
Comfort
-0.58
bell
-0.57
Invisible
-0.57
Marion
-0.57
Ark
-0.56
POSITIVE LOGITS
evidenced
0.90
actionGroup
0.89
Âł Âł Âł Âł
0.79
[|
0.78
wont
0.78
Âł Âł Âł Âł Âł Âł Âł Âł
0.75
previously
0.71
çͰ
0.70
attest
0.67
());
0.66
Activations Density 0.186%