INDEX
Explanations
terms related to cyber attacks and misinformation tactics
New Auto-Interp
Negative Logits
Autorizaciones
-0.41
ospiti
-0.41
avoient
-0.40
sizeCache
-0.40
デューサー
-0.39
betreft
-0.39
Vereine
-0.37
towarzys
-0.37
escritora
-0.36
illustrazione
-0.36
POSITIVE LOGITS
sabotage
0.67
Sabo
0.58
poison
0.54
sabo
0.53
prank
0.53
deliberately
0.51
故意
0.50
poison
0.49
pranks
0.48
わざ
0.48
Activations Density 0.499%