INDEX
Explanations
phrases related to sharing experiences or resources
New Auto-Interp
Negative Logits
931
-0.20
ukan
-0.15
stup
-0.15
########.
-0.15
uft
-0.15
vrai
-0.14
teness
-0.14
umper
-0.14
FRING
-0.14
uber
-0.14
POSITIVE LOGITS
share
0.75
share
0.60
Share
0.54
SHARE
0.53
-share
0.51
Share
0.51
fair
0.50
.share
0.47
_share
0.46
shares
0.41
Activations Density 0.046%