INDEX
Explanations
action verbs that denote strong or impactful actions
gerunds and present participles indicating actions
New Auto-Interp
Negative Logits
available
-0.70
eg
-0.69
peg
-0.67
oha
-0.66
û
-0.66
-0.65
Reply
-0.63
ana
-0.62
yon
-0.62
utch
-0.62
POSITIVE LOGITS
ãĥ¥
0.70
=]
0.61
agents
0.60
arts
0.59
redients
0.58
agent
0.57
aliases
0.57
LESS
0.57
starvation
0.56
aside
0.55
Activations Density 0.272%