INDEX
Explanations
phrases related to showing preference or bias towards something or someone
terms related to favoritism and support
New Auto-Interp
Negative Logits
prototype
-0.75
bridge
-0.72
borg
-0.70
Creed
-0.67
ADVERTISEMENT
-0.67
OTT
-0.67
ı
-0.66
oufl
-0.65
gaard
-0.64
Gorge
-0.63
POSITIVE LOGITS
itism
1.74
ited
1.02
ably
1.00
itures
0.85
hift
0.83
favoring
0.80
itic
0.78
uate
0.76
abilia
0.76
naire
0.75
Activations Density 0.011%