INDEX
Explanations
so followed by pronoun
so now, so understanding, so every
New Auto-Interp
Negative Logits
ó
0.94
I
0.73
in
0.73
و
0.72
us
0.71
and
0.66
B
0.66
ल
0.64
map
0.64
sympath
0.63
POSITIVE LOGITS
ان
0.92
он
0.80
isn
0.79
か
0.76
an
0.75
к
0.73
인한
0.73
ва
0.70
it
0.70
an
0.69
Activations Density 0.819%