INDEX
Explanations
mentions of relationships or interactions between individuals
instances of the word "the" and focus on topics related to companionship and family dynamics
New Auto-Interp
Negative Logits
herer
-0.78
hazard
-0.77
ibaba
-0.77
cens
-0.75
pai
-0.71
Layer
-0.71
STD
-0.70
esthesia
-0.70
iversity
-0.70
peak
-0.70
POSITIVE LOGITS
duo
1.80
pair
1.72
trio
1.57
twins
1.50
latter
1.41
couple
1.40
brothers
1.30
two
1.21
siblings
1.21
similarities
1.20
Activations Density 0.310%