INDEX
Explanations
words related to interpersonal relationships and interactions between people
references to relationships and social groups
New Auto-Interp
Negative Logits
cer
-0.80
Kuala
-0.64
bilateral
-0.61
aton
-0.60
BN
-0.57
du
-0.57
sum
-0.56
cos
-0.56
signage
-0.55
ray
-0.55
POSITIVE LOGITS
selves
1.03
hip
0.98
folk
0.98
hips
0.93
ervative
0.91
mates
0.87
heet
0.81
mith
0.80
counterparts
0.77
ystem
0.77
Activations Density 0.145%