INDEX
Explanations
specific categories of nouns, such as medical conditions, interpersonal relationships, personal attributes, and financial terms
terms related to personal relationships and individual circumstances
New Auto-Interp
Negative Logits
ourselves
-0.83
Helpful
-0.78
yourselves
-0.72
Guan
-0.70
oneself
-0.67
themselves
-0.65
unison
-0.65
alike
-0.65
Rohing
-0.63
Beg
-0.61
POSITIVE LOGITS
wife
0.96
girlfriend
0.92
buddies
0.89
counterpart
0.89
colleague
0.87
persona
0.87
mates
0.86
mates
0.84
counterparts
0.84
opic
0.83
Activations Density 0.449%