INDEX
Explanations
interactions and conflicts involving threats and consequences
New Auto-Interp
Negative Logits
-0.16
plá
-0.15
foreign
-0.15
Foreign
-0.15
ux
-0.14
742
-0.14
foreign
-0.14
beck
-0.14
859
-0.14
clearColor
-0.14
POSITIVE LOGITS
boycott
0.22
boyc
0.21
retali
0.18
threatened
0.18
pun
0.18
ç½ļ
0.17
threaten
0.17
threat
0.16
ostr
0.16
-threat
0.15
Activations Density 0.276%