INDEX
Explanations
names or mentions of specific individuals
names related to a specific individual, likely in the context of activism or a company
New Auto-Interp
Negative Logits
é¾įå¥ij士
-0.82
©¶æ
-0.81
Archdemon
-0.76
ccording
-0.72
å§
-0.71
indebted
-0.68
xious
-0.67
Flavoring
-0.66
magnification
-0.66
glers
-0.65
POSITIVE LOGITS
enhagen
0.86
tein
0.83
illions
0.82
anamo
0.82
hemy
0.81
steen
0.81
ieri
0.80
coni
0.80
inen
0.78
nih
0.78
Activations Density 0.178%