INDEX
Explanations
phrases indicating a transition or shift in focus
New Auto-Interp
Negative Logits
eton
-0.69
lde
-0.61
ifax
-0.60
en
-0.56
weet
-0.54
ania
-0.52
ource
-0.52
lie
-0.52
ens
-0.52
uran
-0.51
POSITIVE LOGITS
agy
0.56
Gupta
0.56
chest
0.52
zu
0.49
ggle
0.49
usions
0.49
kered
0.47
Sharma
0.47
Reviewer
0.46
BALL
0.46
Activations Density 0.187%