INDEX
Explanations
quotations or statements
New Auto-Interp
Negative Logits
oses
-0.70
Guard
-0.70
lean
-0.66
respect
-0.63
aq
-0.62
gur
-0.62
Gaza
-0.60
le
-0.60
´
-0.59
eal
-0.59
POSITIVE LOGITS
they
0.95
soever
0.90
there
0.89
THEY
0.89
we
0.86
nobody
0.83
unlike
0.81
although
0.79
*/(
0.76
it
0.74
Activations Density 0.111%