INDEX
Explanations
names of individuals
proper nouns, specifically names
New Auto-Interp
Negative Logits
lawy
-0.68
Reviewer
-0.67
irlf
-0.66
Flavoring
-0.66
glers
-0.64
withstanding
-0.62
*/(
-0.62
avorite
-0.61
footed
-0.61
cause
-0.60
POSITIVE LOGITS
ette
0.75
Wynne
0.73
idge
0.71
enne
0.71
atis
0.68
ettes
0.67
opa
0.65
gain
0.65
ello
0.64
illo
0.64
Activations Density 0.091%