INDEX
Explanations
declarative statements with the word "know" followed by some information or assertion
recurrent phrases asserting knowledge or awareness
New Auto-Interp
Negative Logits
issance
-0.81
onding
-0.80
isco
-0.70
orthy
-0.69
phrine
-0.69
ermanent
-0.68
sidx
-0.67
pione
-0.66
achus
-0.65
aukee
-0.63
POSITIVE LOGITS
firsthand
1.20
plenty
1.02
how
1.00
why
0.96
anecd
0.93
what
0.91
exactly
0.91
nothing
0.90
lots
0.89
many
0.85
Activations Density 0.040%