INDEX
Explanations
proper nouns
the presence of the word "zer" in various contexts
New Auto-Interp
Negative Logits
ership
-0.97
ers
-0.83
erest
-0.75
raising
-0.74
anooga
-0.68
rast
-0.66
luent
-0.65
ivity
-0.65
ivities
-0.65
ifice
-0.63
POSITIVE LOGITS
zer
0.80
geist
0.79
abwe
0.79
vous
0.75
otonin
0.74
imbabwe
0.70
è¦ļéĨĴ
0.69
ploy
0.69
ãĤº
0.69
ppelin
0.69
Activations Density 0.017%