INDEX
Explanations
terms related to propaganda and recruitment activities
New Auto-Interp
Negative Logits
arget
-0.17
eman
-0.15
errar
-0.15
eken
-0.15
βο
-0.14
ernaut
-0.14
aket
-0.14
βολ
-0.14
comput
-0.13
ematik
-0.13
POSITIVE LOGITS
incer
0.16
ytic
0.15
_AI
0.15
illis
0.15
оÑĢÑĭ
0.14
æĭ©
0.14
ijk
0.14
zcze
0.14
iffs
0.14
yat
0.14
Activations Density 0.036%