INDEX
Explanations
requests for clicking a box to verify non-robot status
prompts or actions related to user interaction on a webpage
New Auto-Interp
Negative Logits
ccording
-0.72
venge
-0.61
brunt
-0.56
mete
-0.56
proport
-0.54
helicop
-0.54
relative
-0.53
Constantin
-0.52
Anon
-0.51
comr
-0.50
POSITIVE LOGITS
assis
0.79
PsyNet
0.70
Cancel
0.67
Asset
0.67
ricular
0.64
buttons
0.64
iframe
0.63
Sign
0.62
Download
0.62
taboola
0.61
Activations Density 0.005%