INDEX
Explanations
questions suggesting uncertainty or lack of knowledge
the phrase "who knows" and variations
New Auto-Interp
Negative Logits
ciating
-0.86
herent
-0.74
ItemTracker
-0.69
aten
-0.67
Rated
-0.65
cially
-0.64
lies
-0.63
inance
-0.63
esthesia
-0.63
issance
-0.62
POSITIVE LOGITS
fri
0.64
scen
0.63
how
0.62
ROR
0.61
rium
0.59
Kitt
0.59
amorph
0.59
sew
0.58
fuzz
0.58
sunshine
0.58
Activations Density 0.047%