INDEX
Explanations
proper nouns and commercial product names
New Auto-Interp
Negative Logits
oÄŁ
-0.55
alle
-0.53
Balt
-0.52
trave
-0.51
uth
-0.51
concess
-0.49
allah
-0.49
ursed
-0.48
©¶æ
-0.48
ather
-0.48
POSITIVE LOGITS
sibling
0.53
sister
0.53
successor
0.52
mascot
0.52
ieth
0.51
Tik
0.51
sponsor
0.51
Polaris
0.51
sequel
0.51
Newt
0.50
Activations Density 7.035%