INDEX
Explanations
sentences that introduce or discuss different topics
New Auto-Interp
Negative Logits
ãĤ·ãĥ£
-0.67
ctors
-0.62
srf
-0.57
ĪĴ
-0.56
idth
-0.54
Weather
-0.53
Tro
-0.53
ao
-0.52
Crush
-0.52
ãĥ«
-0.51
POSITIVE LOGITS
anyway
0.99
anyways
0.98
aloud
0.96
yourself
0.94
ourselves
0.93
myself
0.93
firsthand
0.91
ASAP
0.89
herself
0.88
anonymously
0.86
Activations Density 5.822%