INDEX
Explanations
expressions emphasizing significance or particularity
New Auto-Interp
Negative Logits
ÑģÑı
-0.15
adam
-0.15
tright
-0.14
illin
-0.14
amarin
-0.14
zk
-0.14
chnitt
-0.14
ogan
-0.13
adel
-0.13
indow
-0.13
POSITIVE LOGITS
ones
0.21
those
0.16
ones
0.16
Ones
0.15
efa
0.15
when
0.15
eslint
0.14
revolving
0.13
Bracket
0.13
Shir
0.13
Activations Density 0.024%