INDEX
Explanations
introductory phrases indicating the beginning of a new topic or statement
introductory phrases that signal the start of a list or explanation
New Auto-Interp
Negative Logits
ween
-0.85
kids
-0.76
sung
-0.74
mens
-0.71
abled
-0.71
lain
-0.71
sports
-0.70
anim
-0.69
thirst
-0.69
breeding
-0.68
POSITIVE LOGITS
introdu
0.89
volley
0.89
foremost
0.76
admit
0.72
congratulations
0.71
apologies
0.69
ega
0.68
apologize
0.67
lesson
0.65
premise
0.65
Activations Density 0.115%