INDEX
Explanations
names of individuals
mentions of the name "Beth"
New Auto-Interp
Negative Logits
*/(
-0.89
00007
-0.79
xual
-0.75
agents
-0.69
aeda
-0.66
Executive
-0.66
fixation
-0.65
eers
-0.64
ACTION
-0.62
intent
-0.61
POSITIVE LOGITS
lehem
1.23
Beth
1.17
terness
0.91
urst
0.90
ode
0.79
ãĤ¡
0.79
anie
0.78
Coh
0.78
Anne
0.76
®
0.76
Activations Density 0.005%