INDEX
Explanations
words that appear to be names of people or characters
proper nouns, particularly names and initials
New Auto-Interp
Negative Logits
ModLoader
-0.84
rency
-0.76
enegger
-0.74
pection
-0.74
theless
-0.73
externalActionCode
-0.70
heit
-0.70
ornia
-0.70
netflix
-0.68
kefeller
-0.68
POSITIVE LOGITS
henko
0.82
opoulos
0.77
ovsky
0.77
akis
0.77
jad
0.77
Ö¼
0.77
emort
0.76
kov
0.74
uly
0.66
Kear
0.66
Activations Density 0.282%