INDEX
Explanations
questions beginning with "What do"
interrogative phrases that ask for opinions or information
New Auto-Interp
Negative Logits
fox
-0.66
WAYS
-0.64
Bridges
-0.63
mint
-0.62
legram
-0.62
agonists
-0.61
visory
-0.61
Dialogue
-0.59
boats
-0.59
Immunity
-0.59
POSITIVE LOGITS
?]
0.80
onga
0.71
wrong
0.69
omsday
0.66
transpired
0.66
?),
0.66
includ
0.63
glean
0.63
happen
0.62
characterize
0.62
Activations Density 0.049%