INDEX
Explanations
phrases related to people and interactions
references to individuals, particularly women and friends, in various contexts
New Auto-Interp
Negative Logits
è£ıç
-0.74
;;;;;;;;;;;;
-0.71
SPONSORED
-0.69
olds
-0.68
ĸļ
-0.68
assies
-0.66
aeda
-0.66
estyles
-0.66
ologies
-0.63
ernels
-0.62
POSITIVE LOGITS
alyst
0.84
named
0.82
sergeant
0.77
spokeswoman
0.77
staffer
0.77
wrote
0.75
reviewer
0.74
responded
0.74
spokesman
0.72
spokesperson
0.72
Activations Density 0.302%