INDEX
Explanations
references to the name "Victor" with varying strengths of activation
the name "Victor"
New Auto-Interp
Negative Logits
shed
-0.80
BOOK
-0.76
earance
-0.74
eworld
-0.74
eling
-0.74
etary
-0.73
dress
-0.72
ness
-0.71
lease
-0.70
STER
-0.70
POSITIVE LOGITS
Hugo
0.98
Victor
0.95
ians
0.87
orian
0.86
ancouver
0.85
iana
0.85
inus
0.83
Yanuk
0.83
ines
0.82
inian
0.82
Activations Density 0.021%