INDEX
Explanations
statements or assertions
instances of the phrase "the fact that."
New Auto-Interp
Negative Logits
aukee
-0.77
Si
-0.72
pec
-0.70
vc
-0.69
uttering
-0.66
ocking
-0.64
yn
-0.64
ately
-0.63
Eye
-0.63
wn
-0.63
POSITIVE LOGITS
someone
0.96
they
0.94
nobody
0.87
there
0.84
somebody
0.84
hindsight
0.83
we
0.81
everyone
0.79
anyone
0.76
humans
0.76
Activations Density 0.084%