INDEX
Explanations
mentions of the word "jack."
instances of the word "jack"
New Auto-Interp
Negative Logits
Frag
-0.77
Towns
-0.71
Viet
-0.70
Corpus
-0.70
CONT
-0.65
Parent
-0.65
uve
-0.64
Diet
-0.64
Schwar
-0.64
phrine
-0.64
POSITIVE LOGITS
ety
1.06
nell
1.04
intosh
1.04
hammer
1.00
awaru
0.99
knife
0.99
nels
0.95
ument
0.94
assian
0.93
pots
0.93
Activations Density 0.024%