INDEX
Explanations
phrases related to confusion or lack of understanding
repeated references to the concept of "going on," indicating a search for clarity or understanding in a situation
New Auto-Interp
Negative Logits
gart
-0.72
vale
-0.65
ament
-0.59
inates
-0.57
stra
-0.54
adjusted
-0.54
aments
-0.52
Desk
-0.52
rede
-0.52
relinqu
-0.51
POSITIVE LOGITS
wrong
0.91
happening
0.80
viral
0.75
Wrong
0.71
happen
0.70
wrong
0.70
ãĥ£
0.69
ggle
0.69
lems
0.69
æĸ¹
0.68
Activations Density 0.031%