INDEX
Explanations
words related to granting or receiving favors, especially in political or business contexts
references to political favors and favoritism
New Auto-Interp
Negative Logits
yrinth
-0.79
borg
-0.79
oufl
-0.73
anwhile
-0.73
prototype
-0.71
ways
-0.68
bridge
-0.67
mberg
-0.66
teen
-0.66
ı
-0.66
POSITIVE LOGITS
itism
1.62
ited
1.00
ably
0.92
itures
0.89
favors
0.79
itive
0.77
hift
0.76
IOR
0.75
iture
0.71
favoring
0.70
Activations Density 0.021%