INDEX
Explanations
the word "bag" with a high activation level
references to bags
New Auto-Interp
Negative Logits
vironment
-0.72
hower
-0.67
terday
-0.66
Galile
-0.65
issance
-0.65
relations
-0.64
spect
-0.62
preschool
-0.61
nesota
-0.60
spectrum
-0.60
POSITIVE LOGITS
ging
1.23
bag
1.19
gie
1.15
gery
1.13
ged
1.12
bags
1.10
pipe
1.05
glers
0.99
rill
0.96
gers
0.96
Activations Density 0.016%