INDEX
Explanations
phrases related to actions and mechanics
New Auto-Interp
Negative Logits
ç¹ģ
-0.15
oris
-0.15
868
-0.15
.sul
-0.14
eb
-0.14
olduk
-0.14
loff
-0.14
Gam
-0.14
£
-0.14
964
-0.14
POSITIVE LOGITS
åħ¶ä¸Ń
0.18
among
0.17
included
0.17
featured
0.16
enberg
0.15
special
0.15
especially
0.15
apart
0.15
throughout
0.15
among
0.15
Activations Density 0.020%