INDEX
Explanations
references to the term 'Toy'
references to the "Toy Story" franchise and related elements
New Auto-Interp
Negative Logits
aunder
-0.73
terday
-0.72
xual
-0.72
icter
-0.70
livest
-0.70
etheless
-0.68
idency
-0.67
Hurricanes
-0.66
untreated
-0.65
andestine
-0.65
POSITIVE LOGITS
ota
1.14
shop
0.87
hammer
0.87
Toy
0.85
owa
0.81
Chest
0.81
oga
0.80
ograp
0.80
Crate
0.79
Toy
0.78
Activations Density 0.009%