INDEX
Explanations
references to people and their relationships
New Auto-Interp
Negative Logits
osphere
-0.76
trop
-0.74
fecture
-0.72
abase
-0.72
availability
-0.71
flush
-0.69
orgetown
-0.69
Indust
-0.68
verage
-0.68
iti
-0.68
POSITIVE LOGITS
selves
1.11
brothers
1.03
siblings
1.00
selves
0.94
sisters
0.92
were
0.92
minds
0.91
duo
0.91
reunited
0.89
twins
0.89
Activations Density 0.262%