INDEX
Explanations
mentions of the word "Bir" with varying activation values
words related to the term "biracial."
New Auto-Interp
Negative Logits
nomine
-0.67
eur
-0.67
ĵĺ
-0.63
EMBER
-0.61
Tune
-0.60
paternal
-0.59
tune
-0.58
wise
-0.58
Unit
-0.58
plat
-0.57
POSITIVE LOGITS
mingham
1.28
thing
1.08
git
1.05
ging
0.94
ney
0.92
chell
0.91
keley
0.91
combe
0.87
gel
0.87
ulia
0.86
Activations Density 0.021%