INDEX
Explanations
the mention of individuals and their affiliated organizations or roles
New Auto-Interp
Negative Logits
et
-0.20
ettle
-0.18
L
-0.17
rig
-0.17
ring
-0.16
rung
-0.16
ety
-0.16
rone
-0.15
eti
-0.15
ett
-0.15
POSITIVE LOGITS
owan
0.17
rier
0.15
hee
0.15
lets
0.15
ivable
0.15
overn
0.15
arry
0.14
arity
0.14
ourmet
0.14
ãĥ¼ãĥĵ
0.14
Activations Density 0.023%