INDEX
Explanations
proper names starting with "Abdul"
the repeated mention of the name "Abdul" and its variants
New Auto-Interp
Negative Logits
CFR
-0.78
Meet
-0.74
Dominion
-0.70
weeney
-0.69
Rivals
-0.68
SEN
-0.67
Wolves
-0.67
Scottish
-0.67
Closed
-0.65
Leap
-0.65
POSITIVE LOGITS
Abdul
1.17
citiz
0.93
renheit
0.92
rahim
0.92
etheless
0.92
ibaba
0.92
versa
0.89
challeng
0.86
vertisement
0.85
andom
0.85
Activations Density 0.005%