INDEX
Explanations
phrases related to social interactions and interpersonal relationships
New Auto-Interp
Negative Logits
ogle
-0.69
elight
-0.68
pletion
-0.67
duc
-0.67
kie
-0.66
bara
-0.66
Reader
-0.65
livion
-0.65
phies
-0.64
ruption
-0.60
POSITIVE LOGITS
mutually
1.46
together
1.16
insepar
1.16
jointly
1.10
respectively
1.04
together
1.01
mutual
1.00
Together
0.98
respective
0.97
agree
0.96
Activations Density 2.987%