INDEX
Explanations
verbs related to actions taken by individuals or groups
New Auto-Interp
Negative Logits
selves
-0.87
Higher
-0.74
millenn
-0.71
selves
-0.67
mint
-0.65
Versions
-0.65
%%
-0.65
aroo
-0.64
illion
-0.63
illions
-0.61
POSITIVE LOGITS
herself
1.19
himself
1.01
adamant
0.76
her
0.75
Himself
0.75
his
0.74
coy
0.71
Tsarnaev
0.70
arra
0.67
hers
0.66
Activations Density 0.437%