INDEX
Explanations
references to migration, asylum-seeking, and safety in the context of refugees
New Auto-Interp
Negative Logits
native
-0.07
aln
-0.07
ead
-0.07
aea
-0.06
consort
-0.06
heed
-0.06
onn
-0.06
Progress
-0.06
alom
-0.06
_FAULT
-0.06
POSITIVE LOGITS
sympathetic
0.08
safe
0.07
SAFE
0.07
моÑĢ
0.07
escape
0.07
safer
0.07
safety
0.06
629
0.06
safe
0.06
nearby
0.06
Activations Density 0.021%