INDEX
Explanations
references to bars or similar establishments
New Auto-Interp
Negative Logits
kul
-0.21
kiego
-0.17
empo
-0.17
ene
-0.17
kup
-0.16
kr
-0.15
kola
-0.15
enant
-0.15
eners
-0.15
gang
-0.15
POSITIVE LOGITS
bara
0.28
oque
0.25
tering
0.24
coded
0.24
Harbor
0.24
riers
0.23
coding
0.23
tered
0.22
becue
0.22
rios
0.22
Activations Density 0.012%