INDEX
Explanations
names of individuals
references to specific individuals and entities, particularly focusing on Salman and Malik
New Auto-Interp
Negative Logits
sten
-0.98
shire
-0.93
nec
-0.88
romy
-0.86
oby
-0.85
oly
-0.84
pson
-0.82
onomy
-0.81
cephal
-0.81
racted
-0.81
POSITIVE LOGITS
ngth
0.74
Ùİ
0.71
Madonna
0.69
Miranda
0.67
ãĥŀ
0.67
ع
0.65
Malik
0.65
miah
0.64
aeda
0.64
pigeon
0.64
Activations Density 0.036%