INDEX
Explanations
checkboxes that need to be clicked to confirm non-robotic behavior
references to user interaction elements, particularly checkboxes and buttons
New Auto-Interp
Negative Logits
ufact
-1.01
stood
-0.79
vironment
-0.78
istance
-0.73
¥ŀ
-0.68
ossier
-0.63
pestic
-0.63
senal
-0.62
Rated
-0.61
ngth
-0.61
POSITIVE LOGITS
iru
0.71
skip
0.68
[+]
0.67
thumbnail
0.62
curl
0.62
Birch
0.61
Subscribe
0.60
BELOW
0.60
asses
0.59
boats
0.59
Activations Density 0.030%