INDEX
Explanations
instances of the word "assistant" and related variations
New Auto-Interp
Negative Logits
itſelf
-1.06
purpoſe
-1.04
poffible
-0.99
ſelf
-0.99
greateſt
-0.99
―――――
-0.97
pleaſure
-0.97
uſ
-0.97
reaſon
-0.96
Diſ
-0.94
POSITIVE LOGITS
hire
0.82
Hire
0.68
Dog
0.64
0.61
↵↵↵
0.57
Do
0.56
stars
0.56
|
0.56
include
0.55
'
0.54
Activations Density 0.091%