INDEX
Explanations
proper nouns
references to entities or concepts starting with the letter 'G'
New Auto-Interp
Negative Logits
terday
-0.93
vou
-0.66
htt
-0.66
stic
-0.65
sight
-0.63
=-=-
-0.63
Doct
-0.63
nsics
-0.61
kefeller
-0.61
Antiqu
-0.60
POSITIVE LOGITS
asp
1.16
ossip
0.99
irlfriend
0.94
GT
0.93
affe
0.93
CHQ
0.93
rowth
0.93
ONE
0.93
eeks
0.92
ather
0.92
Activations Density 0.031%