INDEX
Explanations
mentions of the word "family" or its variations
New Auto-Interp
Negative Logits
avar
-0.15
iph
-0.14
cho
-0.14
638
-0.14
jah
-0.14
Yer
-0.14
apor
-0.14
843
-0.14
tie
-0.14
PN
-0.14
POSITIVE LOGITS
æģĭ
0.15
šak
0.15
ynth
0.14
å·¨
0.14
dex
0.14
EDA
0.14
лоп
0.14
ilin
0.14
ë¡ł
0.14
polator
0.14
Activations Density 0.001%