INDEX
Explanations
phrases indicating communication or instructions being given
New Auto-Interp
Negative Logits
oshenko
-0.71
idates
-0.61
operates
-0.60
egal
-0.58
ses
-0.57
nowadays
-0.56
notoriously
-0.55
ouls
-0.54
understandably
-0.53
occupies
-0.52
POSITIVE LOGITS
myself
1.59
my
1.19
him
1.02
ourselves
0.98
them
0.89
ucc
0.80
mine
0.78
THEM
0.77
igree
0.69
yss
0.69
Activations Density 0.534%