INDEX
Explanations
mentions of a specific person, in this case, "Ford"
New Auto-Interp
Negative Logits
Artemis
-0.75
legraph
-0.70
certific
-0.69
ablishment
-0.67
Alien
-0.67
anwhile
-0.66
Sakuya
-0.65
referen
-0.65
hemat
-0.63
Seym
-0.62
POSITIVE LOGITS
ham
1.16
shire
0.97
ragon
0.96
bies
0.89
nec
0.79
isle
0.79
ring
0.78
zie
0.77
bie
0.76
clad
0.75
Activations Density 0.003%