INDEX
Explanations
phrases indicating origin or source
New Auto-Interp
Negative Logits
abay
-0.15
ulk
-0.15
loff
-0.14
igli
-0.14
ebin
-0.14
ULK
-0.14
ouce
-0.14
bao
-0.14
icher
-0.14
hers
-0.13
POSITIVE LOGITS
rome
0.15
Bund
0.15
ĵåIJį
0.15
odash
0.14
nes
0.14
oo
0.13
å±ĭ
0.13
ullivan
0.13
TRL
0.13
ollipop
0.13
Activations Density 0.016%