INDEX
Explanations
references to bars or similar establishments
New Auto-Interp
Negative Logits
ese
-0.25
y
-0.20
naire
-0.20
ene
-0.20
esse
-0.18
ester
-0.18
aires
-0.18
estar
-0.18
end
-0.18
edb
-0.17
POSITIVE LOGITS
riers
0.24
oque
0.20
mony
0.19
rows
0.19
celona
0.18
becue
0.18
asaki
0.17
rier
0.16
tered
0.16
rell
0.16
Activations Density 0.027%