INDEX
Explanations
specific references or objects mentioned within a broader context
the word "which" and its frequency in various contexts
New Auto-Interp
Negative Logits
aiden
-0.82
mind
-0.71
bart
-0.70
bt
-0.68
politics
-0.68
ben
-0.66
marg
-0.66
usk
-0.65
trap
-0.65
Haunted
-0.65
POSITIVE LOGITS
soever
1.11
we
0.95
they
0.93
he
0.91
she
0.84
there
0.73
you
0.66
I
0.66
millions
0.66
it
0.63
Activations Density 0.039%