INDEX
Negative Logits
038
-0.07
41
-0.07
37
-0.07
46
-0.07
93
-0.07
College
-0.07
Korea
-0.07
Gerr
-0.07
38
-0.07
girl
-0.07
POSITIVE LOGITS
advantages
0.16
advantage
0.15
Advantage
0.12
advantageous
0.11
antages
0.10
вай
0.08
advant
0.08
disadvantage
0.08
disadvantages
0.08
optimized
0.07
Activations Density 0.014%