INDEX
Explanations
punctuation marks indicating continuation or separation of thoughts
New Auto-Interp
Negative Logits
Balanced
-0.71
geon
-0.68
DAQ
-0.66
Charg
-0.62
Medals
-0.58
Explan
-0.58
èĢħ
-0.58
Charg
-0.58
è£ħ
-0.58
Nex
-0.57
POSITIVE LOGITS
uh
1.26
um
1.13
albeit
0.82
say
0.79
ah
0.78
oh
0.78
say
0.77
gasp
0.76
yeah
0.76
respectively
0.76
Activations Density 0.105%