INDEX
Explanations
punctuation marks and specific formatting related to textual structure
New Auto-Interp
Negative Logits
but
-0.89
sometimes
-0.86
But
-0.81
parfois
-0.80
But
-0.78
but
-0.78
whatever
-0.78
Maybe
-0.78
How
-0.77
Sometimes
-0.77
POSITIVE LOGITS
Commenting
1.13
According
0.86
According
0.86
commenting
0.77
Earlier
0.74
commented
0.73
Speaking
0.72
Speaking
0.72
Following
0.71
congratulated
0.71
Activations Density 0.244%