INDEX
Explanations
phrases related to making choices or decisions
pronouns and references to individuals or groups
New Auto-Interp
Negative Logits
vier
-0.73
standing
-0.72
wik
-0.71
gallery
-0.66
Sov
-0.66
ãĥ©ãĥ³
-0.65
lining
-0.64
vine
-0.64
atory
-0.64
iny
-0.63
POSITIVE LOGITS
'd
0.91
'll
0.84
encountered
0.83
desired
0.81
wished
0.78
wish
0.77
frequ
0.76
deems
0.75
deem
0.75
ever
0.75
Activations Density 0.122%