INDEX
Explanations
names of people in various contexts
names of people, specifically those in entertainment or culture
New Auto-Interp
Negative Logits
=-=-=-=-=-=-=-=-
-0.54
glim
-0.51
Reviewer
-0.49
DonaldTrump
-0.48
âĸĵ
-0.48
NETWORK
-0.47
BOOK
-0.46
20439
-0.45
prompt
-0.44
bulletin
-0.44
POSITIVE LOGITS
respectively
0.57
)|
0.57
anas
0.53
ande
0.50
)]
0.48
ado
0.47
ys
0.47
anus
0.46
tis
0.45
)--
0.45
Activations Density 1.762%