INDEX
Explanations
references to historical events and acts of violence
New Auto-Interp
Negative Logits
furt
-0.17
é¼»
-0.16
aliz
-0.15
vard
-0.15
.AddParameter
-0.15
гоÑģп
-0.14
ži
-0.14
dej
-0.14
azing
-0.13
orne
-0.13
POSITIVE LOGITS
resistance
0.28
Resistance
0.26
Resistance
0.26
Resist
0.24
resist
0.23
independence
0.23
freedom
0.22
Independence
0.21
ÙħÙĤاÙĪ
0.20
kháng
0.20
Activations Density 0.233%