INDEX
Explanations
specific instances of actions or descriptions of actions
interrogative phrases or words indicating questioning or making inquiries
New Auto-Interp
Negative Logits
}}}
-0.52
WARNING
-0.51
VIDEOS
-0.51
tro
-0.47
))))
-0.46
Instructions
-0.46
dwindling
-0.45
tnc
-0.45
ovember
-0.45
unanswered
-0.44
POSITIVE LOGITS
they
1.36
she
1.29
he
1.28
they
1.19
she
1.04
THEY
0.95
we
0.91
SHE
0.90
he
0.85
They
0.82
Activations Density 0.657%