INDEX
Explanations
terms related to sabotage and subversion
sabotage and subvert
New Auto-Interp
Negative Logits
########.
-0.58
kasarigan
-0.56
SpringRunner
-0.53
liski
-0.50
beliau
-0.50
therefrom
-0.49
extrême
-0.49
thereon
-0.47
fassen
-0.47
canestro
-0.46
POSITIVE LOGITS
sabotage
1.52
Sabo
1.27
sabo
1.20
sab
0.72
破坏
0.58
oteur
0.56
sabato
0.55
spy
0.52
undermining
0.50
thwart
0.49
Activations Density 0.012%