INDEX
Explanations
words related to interviews and statements from various news sources
possessive pronouns or possessive constructions
New Auto-Interp
Negative Logits
sson
-0.79
anges
-0.79
mo
-0.77
linked
-0.69
ro
-0.69
ovi
-0.68
dylib
-0.67
forest
-0.66
ppo
-0.66
ol
-0.66
POSITIVE LOGITS
newest
1.02
own
0.94
finest
0.90
flagship
0.90
latest
0.88
biggest
0.86
sake
0.83
signature
0.80
ogyn
0.80
official
0.79
Activations Density 0.157%