INDEX
Explanations
words related to personal relationships and communication
New Auto-Interp
Negative Logits
]=
-0.67
bernatorial
-0.63
unction
-0.59
Wr
-0.58
PDATED
-0.57
served
-0.57
Heist
-0.56
adobe
-0.56
pestic
-0.55
summary
-0.52
POSITIVE LOGITS
whom
0.91
strangers
0.84
counterparts
0.83
outside
0.77
coworkers
0.74
peers
0.74
beforehand
0.73
during
0.71
abroad
0.71
mates
0.71
Activations Density 0.496%