INDEX
Explanations
references to dangerous or aggressive actions or threats
following prepositions and conjunctions
disrupting and sabotaging
New Auto-Interp
Negative Logits
Administrativna
-0.60
ViewFeatures
-0.56
Архівовано
-0.53
tamen
-0.53
optim
-0.52
ottim
-0.51
optimizer
-0.49
otim
-0.49
forgiving
-0.49
SPJ
-0.49
POSITIVE LOGITS
disruption
0.86
destabili
0.85
disrupt
0.81
disrupting
0.78
sabotage
0.77
forcing
0.77
cripple
0.74
paral
0.74
intimidate
0.73
paraly
0.72
Activations Density 0.489%