INDEX
Explanations
references to toys
references to toys
New Auto-Interp
Negative Logits
xual
-0.74
idency
-0.71
ancock
-0.70
mary
-0.65
pard
-0.64
sclerosis
-0.63
Liberties
-0.62
icago
-0.62
judgement
-0.61
clair
-0.61
POSITIVE LOGITS
toys
1.03
toy
0.94
Toys
0.92
Crate
0.88
ota
0.85
slot
0.85
geon
0.85
Shop
0.82
haus
0.82
ulus
0.81
Activations Density 0.015%