INDEX
Explanations
instructions or calls to action
references to social media interactions and follow requests
New Auto-Interp
Negative Logits
soluble
-0.72
dfx
-0.66
venge
-0.62
uable
-0.62
Exc
-0.62
gasoline
-0.61
tu
-0.61
apon
-0.60
priority
-0.60
condos
-0.59
POSITIVE LOGITS
closely
0.92
footsteps
0.81
hunt
0.73
steps
0.72
hran
0.71
route
0.68
Leaks
0.68
inspires
0.65
andel
0.64
blindly
0.64
Activations Density 0.193%