INDEX
Explanations
phrases that indicate reports or findings in the media
New Auto-Interp
Negative Logits
ime
-0.16
vel
-0.15
agement
-0.14
æĮ¯ãĤĬ
-0.14
ching
-0.14
ellig
-0.13
Boulevard
-0.13
bes
-0.13
CHA
-0.13
æŁĦ
-0.13
POSITIVE LOGITS
ležit
0.16
iros
0.16
ãĥ³ãĥĦ
0.16
rawer
0.15
peria
0.15
raud
0.15
rani
0.15
RIX
0.14
/Dk
0.14
ristol
0.14
Activations Density 0.175%