INDEX
Explanations
direct speech or quotes from individuals
New Auto-Interp
Negative Logits
“
-0.27
âĢŀ
-0.25
“[
-0.23
(“
-0.21
“â̦
-0.18
``
-0.16
ãĢĮãģĤ
-0.16
iyim
-0.15
ñana
-0.15
بÙĪØ§Ø³Ø·Ø©
-0.15
POSITIVE LOGITS
ir
0.36
gether
0.35
bsite
0.34
adays
0.33
apons
0.31
ly
0.27
gnore
0.27
ory
0.26
tempts
0.25
crease
0.25
Activations Density 0.173%