INDEX
Explanations
questions starting with the word "What"
instances of the word "What" and its variations, particularly in questions
New Auto-Interp
Negative Logits
interstitial
-0.73
768
-0.62
Ń·
-0.62
Bey
-0.62
Discover
-0.58
atis
-0.56
recy
-0.55
ffe
-0.55
ean
-0.55
uchs
-0.54
POSITIVE LOGITS
soever
1.43
happened
1.15
happens
1.11
?!
1.05
else
1.01
happ
1.00
?!"
0.97
!?
0.95
!?"
0.88
bothers
0.83
Activations Density 0.077%