INDEX
Explanations
occurrences of the word "war" and its related terms, indicating discussions about conflict and military actions
New Auto-Interp
Negative Logits
ede
-0.17
chen
-0.16
неÑģ
-0.16
ernal
-0.15
PERT
-0.15
enn
-0.15
ijo
-0.15
ular
-0.15
arity
-0.14
amiento
-0.14
POSITIVE LOGITS
lord
0.21
rier
0.20
rior
0.20
lock
0.20
lords
0.19
far
0.17
fre
0.16
blers
0.16
locks
0.16
like
0.16
Activations Density 0.040%