INDEX
Explanations
phrases indicating differentiation or distinction from something else
New Auto-Interp
Negative Logits
istr
-0.15
blem
-0.14
vs
-0.13
lio
-0.13
ashion
-0.13
tw
-0.13
acon
-0.13
stretched
-0.13
eter
-0.13
enge
-0.13
POSITIVE LOGITS
Roose
0.16
.sat
0.15
///<
0.15
SelectedItem
0.15
itself
0.14
ê·¼
0.14
infeld
0.14
Hooks
0.14
OfSize
0.13
_hooks
0.13
Activations Density 0.030%