INDEX
Explanations
conversational exchanges and dialogue structure in the text
New Auto-Interp
Negative Logits
tbh
-0.79
tasked
-0.67
ngl
-0.67
impactful
-0.64
multiple
-0.64
Idk
-0.63
idk
-0.63
bestie
-0.63
Thankfully
-0.63
Notably
-0.62
POSITIVE LOGITS
muß
0.79
faßt
0.77
everybody
0.70
lousy
0.68
everybody
0.65
Daß
0.65
somebody
0.65
müßte
0.62
daß
0.61
Everybody
0.59
Activations Density 1.030%