INDEX
Explanations
references to toys
references to toys
New Auto-Interp
Negative Logits
ulty
-0.81
transcripts
-0.75
icago
-0.70
idency
-0.66
transcript
-0.66
uth
-0.65
ornia
-0.62
aeda
-0.62
ignty
-0.62
mpeg
-0.61
POSITIVE LOGITS
toys
0.96
boxes
0.91
ota
0.91
box
0.91
pole
0.90
dolls
0.89
slot
0.89
Crate
0.87
toy
0.85
bucks
0.82
Activations Density 0.076%