INDEX
Explanations
the word "know" as a primary trigger
phrases expressing uncertainty or lack of knowledge
New Auto-Interp
Negative Logits
ItemTracker
-0.84
phrine
-0.83
otom
-0.82
uckland
-0.82
gdala
-0.82
sidx
-0.77
gencies
-0.75
ĪĴ
-0.74
ickr
-0.73
coni
-0.73
POSITIVE LOGITS
anymore
1.01
yet
0.80
how
0.80
anything
0.79
nor
0.74
yet
0.71
ledged
0.68
lege
0.67
exactly
0.67
until
0.67
Activations Density 0.043%