INDEX
Explanations
phrases indicating an increase or emphasis on quantity
New Auto-Interp
Negative Logits
oud
-0.16
ogne
-0.16
uben
-0.14
isko
-0.14
ritch
-0.14
kest
-0.14
ourg
-0.14
rat
-0.14
buff
-0.13
çĨ
-0.13
POSITIVE LOGITS
idges
0.18
isons
0.16
-than
0.15
mary
0.15
idge
0.15
-git
0.14
vari
0.14
pery
0.14
zes
0.14
-live
0.14
Activations Density 0.021%