INDEX
Explanations
questions starting with the word "What" or "Who"
rhetorical questions
New Auto-Interp
Negative Logits
://
-0.74
="#
-0.68
avor
-0.58
pistols
-0.58
Heist
-0.55
Mobil
-0.55
KO
-0.55
conv
-0.55
zzy
-0.55
corrid
-0.54
POSITIVE LOGITS
ean
0.79
uh
0.72
somew
0.71
Subst
0.67
ersen
0.67
ulhu
0.65
um
0.65
besides
0.64
anwhile
0.63
isphere
0.62
Activations Density 0.100%