INDEX
Explanations
references to visual perspectives or ways of seeing
New Auto-Interp
Negative Logits
↵ ↵
-0.18
buie
-0.16
isle
-0.15
ideo
-0.15
åij³
-0.15
uebas
-0.15
ileo
-0.15
izik
-0.15
ollipop
-0.15
↵
-0.15
POSITIVE LOGITS
finder
0.34
shed
0.31
ports
0.29
able
0.25
point
0.25
topic
0.25
pager
0.25
ings
0.23
points
0.22
xét
0.20
Activations Density 0.052%