INDEX
Explanations
specific keywords, possibly related to technical topics or computer programming
occurrences of the word "The" at the beginning of sentences
New Auto-Interp
Negative Logits
poke
-0.71
eno
-0.67
ito
-0.63
����
-0.63
ement
-0.62
acea
-0.62
asonic
-0.59
gpu
-0.59
actory
-0.58
dding
-0.58
POSITIVE LOGITS
oret
1.38
resa
1.12
odore
1.04
latter
0.99
downside
0.97
ories
0.96
simplest
0.96
nce
0.89
biggest
0.88
sis
0.88
Activations Density 0.485%