INDEX
Explanations
proper names, particularly those of authors and contributors in academic contexts
New Auto-Interp
Negative Logits
ubs
-0.16
Sala
-0.16
BÃŃ
-0.16
reib
-0.15
eks
-0.15
urve
-0.14
sov
-0.14
esson
-0.14
ooks
-0.14
imens
-0.13
POSITIVE LOGITS
spacer
0.14
ugu
0.14
kitty
0.14
jav
0.14
ê·ľ
0.14
ushman
0.13
eck
0.13
ijľ
0.13
TRL
0.13
ost
0.13
Activations Density 0.079%