INDEX
Explanations
phrases indicating ongoing actions or events
New Auto-Interp
Negative Logits
ernet
-0.20
azor
-0.16
idi
-0.15
pone
-0.15
606
-0.15
udio
-0.14
udi
-0.14
raz
-0.14
inte
-0.14
æķ
-0.13
POSITIVE LOGITS
happening
0.24
wrong
0.24
happen
0.23
bump
0.20
wrong
0.19
happened
0.18
happens
0.17
Wrong
0.17
uns
0.17
going
0.17
Activations Density 0.014%