INDEX
Explanations
proper nouns or aliases of public figures
phrases or terms that indicate alternative names or aliases for individuals
New Auto-Interp
Negative Logits
stal
-0.70
stad
-0.69
Impl
-0.68
pson
-0.66
ÃŃs
-0.64
urrent
-0.63
hani
-0.62
avorable
-0.62
Topics
-0.62
onel
-0.62
POSITIVE LOGITS
CoC
0.78
HERO
0.75
Chosen
0.74
yours
0.70
"@
0.67
Boss
0.67
æ
0.65
oxy
0.64
"#
0.64
è£ıè
0.62
Activations Density 0.015%