INDEX
Explanations
references to surfing or skateboarding
references to waterboarding and related activities
New Auto-Interp
Negative Logits
aments
-0.75
ively
-0.69
Bounty
-0.67
Frey
-0.66
talk
-0.65
ures
-0.65
edy
-0.65
mates
-0.63
orf
-0.63
Toll
-0.62
POSITIVE LOGITS
yip
0.91
anchester
0.87
surfing
0.85
ancock
0.84
boarding
0.80
biking
0.73
xtap
0.72
referen
0.71
quished
0.71
ailand
0.70
Activations Density 0.025%