INDEX
Explanations
words related to controversial topics or statements
occurrences of the letter 'b'
New Auto-Interp
Negative Logits
Wer
-0.60
BART
-0.59
bilt
-0.58
Stall
-0.58
Audi
-0.57
hyd
-0.57
phon
-0.56
justice
-0.56
Bayern
-0.56
tunes
-0.55
POSITIVE LOGITS
rought
1.43
rief
1.31
odies
1.30
esides
1.28
izarre
1.26
ombs
1.26
urden
1.25
ureau
1.21
eware
1.19
usting
1.18
Activations Density 0.037%