INDEX
Explanations
website-related prompts and calls to action
calls to action and references to privacy policies
New Auto-Interp
Negative Logits
ishable
-0.60
Fargo
-0.56
Morg
-0.53
tongues
-0.52
ugu
-0.51
canon
-0.50
omorphic
-0.48
Homer
-0.48
lished
-0.48
Valhalla
-0.48
POSITIVE LOGITS
dinand
0.62
omever
0.59
ockets
0.57
orpor
0.55
orce
0.55
Agenda
0.53
eph
0.52
settings
0.51
ħĭ
0.51
aws
0.50
Activations Density 0.073%