INDEX
Explanations
1 followed by numbers or punctuation
New Auto-Interp
Negative Logits
ak
0.68
is
0.64
u
0.64
1
0.63
2
0.63
ik
0.59
おります
0.58
5
0.58
ok
0.57
9
0.57
POSITIVE LOGITS
unimagin
0.54
ardent
0.52
s
0.50
mittler
0.48
hermit
0.46
côté
0.45
哎
0.45
lovable
0.45
avvic
0.45
hơn
0.45
Activations Density 0.641%