INDEX
Explanations
abbreviations and specific codes
proper nouns and unique identifiers, particularly those related to place names and entities
New Auto-Interp
Negative Logits
Mub
-0.71
eleph
-0.71
MAD
-0.70
gobl
-0.70
Bots
-0.69
pione
-0.69
Ambro
-0.69
ortunately
-0.68
Bil
-0.67
Comet
-0.67
POSITIVE LOGITS
gra
0.90
arde
0.79
ary
0.79
static
0.74
ARY
0.74
pub
0.73
house
0.72
House
0.72
rie
0.72
say
0.72
Activations Density 0.262%