INDEX
Explanations
academic positions and affiliations
mentions of academic institutions and their associated faculty or departments
New Auto-Interp
Negative Logits
awa
-0.51
hump
-0.49
robbers
-0.48
condom
-0.47
headlights
-0.47
silence
-0.46
selfie
-0.44
withdrawals
-0.44
footprints
-0.44
backwards
-0.44
POSITIVE LOGITS
.).
0.68
]).
0.63
)).
0.62
]),
0.61
).
0.58
)]
0.56
)),
0.56
)]
0.54
Synd
0.53
thur
0.53
Activations Density 1.264%