INDEX
Explanations
proper nouns, specifically related to locations and people
references to specific places or concepts, particularly those related to principles and financial contexts
New Auto-Interp
Negative Logits
æĸ¹
-0.82
Leilan
-0.77
Redditor
-0.77
Redemption
-0.72
Salvation
-0.70
Odyssey
-0.68
ynski
-0.68
aram
-0.68
eus
-0.67
ean
-0.67
POSITIVE LOGITS
cipled
1.24
cers
1.17
ciples
1.15
cipl
1.10
ces
1.09
gments
1.00
ciating
0.99
cially
0.93
cing
0.93
thood
0.93
Activations Density 0.026%