INDEX
Explanations
mentions of rankings and positions in various contexts
New Auto-Interp
Negative Logits
ranks
-0.69
cutting
-0.67
Britann
-0.66
Xiao
-0.65
Lawrence
-0.64
Dickinson
-0.64
Torch
-0.64
Wiley
-0.63
laure
-0.62
Tik
-0.62
POSITIVE LOGITS
distance
1.23
committee
1.19
advertising
1.17
speech
1.15
diff
1.15
prison
1.15
terms
1.13
function
1.13
recomm
1.11
cong
1.11
Activations Density 0.476%