INDEX
Explanations
phrases related to the identity or description of objects or concepts
New Auto-Interp
Negative Logits
ertz
-0.16
/gui
-0.15
gnu
-0.15
ighter
-0.15
aversable
-0.15
udic
-0.15
erton
-0.14
idal
-0.14
irit
-0.14
esser
-0.14
POSITIVE LOGITS
liers
0.16
dn
0.16
lies
0.15
_compat
0.15
mini
0.15
otti
0.15
seau
0.14
Fat
0.14
Hass
0.14
ji
0.14
Activations Density 0.015%