INDEX
Explanations
names of individuals or proper nouns in the context of significant actions or events
New Auto-Interp
Negative Logits
selection
-0.69
manipulating
-0.65
'.
-0.64
disproportion
-0.63
interfering
-0.62
LEASE
-0.61
âĶĢâĶĢ
-0.61
preferential
-0.60
paralle
-0.60
%%
-0.60
POSITIVE LOGITS
said
1.14
Said
1.00
exclaimed
0.97
wrote
0.96
says
0.94
told
0.93
joked
0.92
recalls
0.92
said
0.90
recalled
0.89
Activations Density 0.104%