INDEX
Explanations
questions or statements involving people's thoughts, actions, or inquiries
phrases related to people's behaviors and interactions
New Auto-Interp
Negative Logits
quished
-0.81
teasp
-0.69
srfAttach
-0.69
rontal
-0.66
poral
-0.66
overseen
-0.64
presided
-0.63
VIDIA
-0.62
executive
-0.61
Cu
-0.60
POSITIVE LOGITS
themselves
0.88
alike
0.81
theirs
0.78
inge
0.71
THEIR
0.69
their
0.69
clam
0.69
selves
0.67
sense
0.67
their
0.67
Activations Density 0.484%