INDEX
Explanations
mentions of twins
references to twins
New Auto-Interp
Negative Logits
Explain
-0.74
SOURCE
-0.72
Clin
-0.72
anwhile
-0.70
Frag
-0.69
vernment
-0.66
ãģĵ
-0.66
Retrieved
-0.66
UME
-0.66
Period
-0.65
POSITIVE LOGITS
twin
1.17
twins
0.91
brother
0.89
ning
0.84
sister
0.82
ned
0.80
brothers
0.79
sibling
0.77
ieth
0.74
pillars
0.71
Activations Density 0.003%