INDEX
Explanations
proper nouns
possessive forms indicating ownership or association
New Auto-Interp
Negative Logits
ISO
-0.76
hari
-0.74
lehem
-0.74
linked
-0.73
olkien
-0.72
ENN
-0.72
KT
-0.72
Reviewer
-0.68
":[
-0.65
lling
-0.65
POSITIVE LOGITS
own
1.00
newest
0.97
signature
0.92
footsteps
0.92
penchant
0.89
finest
0.89
insistence
0.87
whereabouts
0.86
ullivan
0.85
panic
0.85
Activations Density 0.157%