INDEX
Explanations
statements made by individuals in the context of discussions or reports on social issues
New Auto-Interp
Negative Logits
raya
-0.17
uten
-0.16
utt
-0.15
окÑĥ
-0.15
missive
-0.15
rud
-0.15
leon
-0.15
voks
-0.15
vell
-0.15
gue
-0.14
POSITIVE LOGITS
.glide
0.14
ark
0.14
Rh
0.13
standards
0.13
instein
0.13
ÑģкоÑĢ
0.13
akers
0.13
prince
0.13
rÄĥng
0.12
reate
0.12
Activations Density 0.126%