INDEX
Explanations
references to collective events or actions
New Auto-Interp
Negative Logits
ur
-0.17
ung
-0.16
aju
-0.14
irus
-0.14
aring
-0.14
uming
-0.14
amine
-0.14
mong
-0.14
ing
-0.14
ic
-0.14
POSITIVE LOGITS
amber
0.15
Andre
0.14
abl
0.14
Andre
0.14
Alec
0.13
Hort
0.13
LinkId
0.13
utex
0.13
Ross
0.13
aleigh
0.12
Activations Density 0.027%