INDEX
Explanations
names of individuals, specifically focusing on the name "Ralph" with varying intensities
mentions of specific individual names
New Auto-Interp
Negative Logits
glers
-0.96
mble
-0.74
tenance
-0.73
ly
-0.73
ppelin
-0.73
Asia
-0.69
flies
-0.68
ning
-0.67
kers
-0.66
lder
-0.66
POSITIVE LOGITS
onse
0.98
Lauren
0.97
Wald
0.95
osal
0.86
onso
0.85
Miliband
0.84
abet
0.83
inating
0.78
ie
0.78
ides
0.74
Activations Density 0.041%