INDEX
Explanations
verbs that describe actions or states
verbs suggesting explanations or descriptions of actions
New Auto-Interp
Negative Logits
selves
-0.82
selves
-0.66
FL
-0.62
mint
-0.61
Copyright
-0.61
millenn
-0.59
illion
-0.58
halla
-0.58
aroo
-0.57
istani
-0.57
POSITIVE LOGITS
herself
1.21
himself
1.07
her
0.87
his
0.87
Himself
0.85
hers
0.79
Chess
0.60
âĦ¢:
0.60
his
0.59
Boh
0.58
Activations Density 0.474%