INDEX
Explanations
actions or tasks that are attempted but not successful
verbs that indicate action or change in state
New Auto-Interp
Negative Logits
eg
-0.79
ember
-0.72
mob
-0.71
eria
-0.70
Phys
-0.68
.?
-0.68
âĢij
-0.68
mp
-0.66
wow
-0.66
peg
-0.65
POSITIVE LOGITS
aside
0.69
theless
0.67
ãĥ¥
0.64
nonetheless
0.64
unsus
0.63
instead
0.59
itely
0.58
=]
0.57
them
0.57
only
0.57
Activations Density 0.326%