INDEX
Explanations
mentions of twins
references to twins
New Auto-Interp
Negative Logits
Liberties
-0.67
trak
-0.66
WI
-0.64
vious
-0.64
supp
-0.63
nce
-0.63
ld
-0.61
Popular
-0.60
isan
-0.59
Nou
-0.59
POSITIVE LOGITS
twins
1.38
poons
0.87
orphans
0.81
omnia
0.79
omething
0.79
idious
0.74
hips
0.73
folk
0.72
roo
0.72
peak
0.71
Activations Density 0.005%