INDEX
Explanations
instances where the phrase "what the" is followed by a description or question
occurrences of the word "the"
New Auto-Interp
Negative Logits
alus
-0.78
avan
-0.77
ignt
-0.72
onduct
-0.70
thia
-0.69
dropping
-0.68
nav
-0.68
hops
-0.68
aunder
-0.66
velt
-0.66
POSITIVE LOGITS
heck
1.66
hell
1.57
fuss
1.34
fuck
1.28
ramifications
0.99
future
0.96
implications
0.94
consequences
0.91
HELL
0.91
repercussions
0.88
Activations Density 0.084%