INDEX
Explanations
proper nouns, particularly the name "Robin"
occurrences of the name "Robin."
New Auto-Interp
Negative Logits
mble
-0.92
ormons
-0.76
ntil
-0.75
resso
-0.73
aternity
-0.71
ller
-0.70
gerald
-0.70
bye
-0.70
chnology
-0.70
ccording
-0.69
POSITIVE LOGITS
Hood
1.23
Robin
0.98
ette
0.91
Robin
0.87
Williams
0.79
Mans
0.78
Lopez
0.78
Oliver
0.77
otte
0.75
Bean
0.75
Activations Density 0.006%