INDEX
Explanations
then followed by pronoun or time
New Auto-Interp
Negative Logits
de
0.62
us
0.60
ax
0.60
0
0.57
ak
0.56
ind
0.56
ist
0.55
rồi
0.55
Затем
0.54
ts
0.54
POSITIVE LOGITS
neither
0.56
it
0.52
indeed
0.52
there
0.50
thisobject
0.46
encontramos
0.46
encuentre
0.46
こそ
0.46
まさに
0.46
miyor
0.46
Activations Density 0.010%