INDEX
Explanations
the name "Harold" at varying levels of activation
instances of the name "Harold."
New Auto-Interp
Negative Logits
eanor
-0.89
hetically
-0.83
psey
-0.81
igger
-0.81
insula
-0.80
ongs
-0.77
agogue
-0.74
oing
-0.73
arnaev
-0.73
ocrats
-0.73
POSITIVE LOGITS
Harold
0.86
Lank
0.82
Kut
0.80
McGee
0.79
Vaj
0.77
Rupert
0.76
Melvin
0.74
balls
0.70
Weinstein
0.70
Cald
0.67
Activations Density 0.020%