INDEX
Explanations
verbs related to urging or requesting action
New Auto-Interp
Negative Logits
istg
-0.73
bilt
-0.69
oday
-0.62
————
-0.62
olitics
-0.61
Laughs
-0.59
âĢ¢âĢ¢âĢ¢âĢ¢
-0.58
ynski
-0.57
monop
-0.57
bunny
-0.57
POSITIVE LOGITS
backs
1.04
attention
0.96
forth
0.94
igraph
0.92
oused
0.92
ouses
0.80
phas
0.80
upon
0.79
ously
0.78
plates
0.77
Activations Density 0.041%