INDEX
Explanations
variations of the letter 'r' in text
New Auto-Interp
Negative Logits
ubl
-0.16
辺
-0.15
beaut
-0.15
pcs
-0.15
ows
-0.15
owie
-0.15
jin
-0.15
ä¿Ŀ
-0.15
idan
-0.14
roscope
-0.14
POSITIVE LOGITS
ifi
0.23
icer
0.20
iten
0.20
IFI
0.19
isc
0.19
aff
0.19
ius
0.18
ient
0.17
bene
0.17
ifier
0.17
Activations Density 0.004%