INDEX
Explanations
phrases indicating continuation or progression of a story or discussion
occurrences of the word "so" used to introduce statements or comments
New Auto-Interp
Negative Logits
SourceFile
-0.70
PF
-0.65
DERR
-0.62
Angelo
-0.60
ļéĨĴ
-0.60
rule
-0.60
verage
-0.59
ailability
-0.59
zag
-0.57
saf
-0.57
POSITIVE LOGITS
yeah
1.14
uh
0.91
yes
0.86
alas
0.85
um
0.84
needless
0.83
beware
0.81
congr
0.81
please
0.80
unsurprisingly
0.80
Activations Density 0.068%