INDEX
Explanations
proper nouns related to individuals or specific entities
mentions of the name "Bar" in various contexts
New Auto-Interp
Negative Logits
lihood
-0.83
ãģ¦
-0.73
ãģį
-0.70
STATES
-0.69
ãĥ¼ãĥĨãĤ£
-0.69
ä
-0.68
ãĤ¼ãĤ¦ãĤ¹
-0.68
士
-0.68
ç«
-0.68
CRIP
-0.68
POSITIVE LOGITS
becue
1.21
itone
1.13
riers
1.11
rington
1.07
celona
1.06
bell
1.05
bara
1.04
keep
1.01
rier
0.99
rage
0.99
Activations Density 0.014%