INDEX
Explanations
the presence of phrases that imply collaborative or supportive actions
New Auto-Interp
Negative Logits
etten
-0.16
occo
-0.15
haps
-0.15
nesty
-0.14
zed
-0.14
ichel
-0.14
uren
-0.14
otal
-0.14
acle
-0.14
igion
-0.13
POSITIVE LOGITS
regard
0.34
stood
0.32
regards
0.30
standing
0.29
/by
0.26
holds
0.25
respect
0.24
drawing
0.24
nhau
0.23
holding
0.21
Activations Density 0.511%