INDEX
Explanations
terms associated with interception and surveillance activities
New Auto-Interp
Negative Logits
trekken
-0.48
peine
-0.46
gione
-0.44
tarto
-0.44
коно
-0.42
recommandée
-0.42
tqdm
-0.41
modelli
-0.40
SetBool
-0.40
precisam
-0.40
POSITIVE LOGITS
intercept
1.01
intercepted
1.00
eaves
0.96
hack
0.89
intercepts
0.88
hijack
0.86
overheard
0.82
AndEndTag
0.82
interception
0.80
intercept
0.79
Activations Density 0.991%