INDEX
Explanations
transitions or conclusions in the narrative
New Auto-Interp
Negative Logits
"}")
-0.61
ſelves
-0.59
-0.53
*/)
-0.53
ாள
-0.52
الحره
-0.52
ete
-0.51
}")
-0.51
aarrggbb
-0.50
soe
-0.50
POSITIVE LOGITS
yeah
1.11
yes
1.03
how
0.93
what
0.90
far
0.89
oner
0.88
why
0.87
if
0.86
basically
0.83
imagine
0.75
Activations Density 0.068%