INDEX
Explanations
references to the term "Fair" in various contexts
New Auto-Interp
Negative Logits
longleftrightarrow
-0.17
esser
-0.17
etter
-0.17
eer
-0.17
ocker
-0.16
erro
-0.15
abwe
-0.15
ector
-0.15
owy
-0.15
eko
-0.15
POSITIVE LOGITS
banks
0.36
mont
0.32
mount
0.30
haven
0.29
child
0.28
view
0.25
bank
0.25
ly
0.25
childs
0.24
ouz
0.24
Activations Density 0.013%