INDEX
Explanations
mentions of ice skaters or skating
references to skateboarding or related terminology
New Auto-Interp
Negative Logits
Mayweather
-0.76
Cause
-0.65
vengeance
-0.63
Manson
-0.63
Tud
-0.62
terday
-0.62
Geo
-0.62
hibition
-0.61
resentment
-0.60
patriarchal
-0.60
POSITIVE LOGITS
yrim
1.17
ipper
1.02
sk
1.02
IPP
0.99
ipped
0.94
ippers
0.94
ipp
0.94
oop
0.92
ull
0.92
irts
0.92
Activations Density 0.006%