INDEX
Explanations
mentions of lost items or pets
New Auto-Interp
Negative Logits
aza
-0.17
hiro
-0.17
_ALIGNMENT
-0.15
ods
-0.15
ì¡
-0.15
uelles
-0.14
urf
-0.14
beth
-0.14
forced
-0.14
erais
-0.13
POSITIVE LOGITS
inton
0.15
voc
0.15
ози
0.14
omit
0.14
_pemb
0.14
·»
0.14
dist
0.14
.rpm
0.14
dil
0.14
Brock
0.14
Activations Density 0.141%