INDEX
Explanations
statements related to authority figures or individuals speaking in a formal context
New Auto-Interp
Negative Logits
Corner
-0.15
corner
-0.14
Wildlife
-0.14
Enemy
-0.14
åķı
-0.14
ask
-0.13
Ask
-0.13
entry
-0.13
.ask
-0.13
usher
-0.13
POSITIVE LOGITS
added
0.28
said
0.23
added
0.22
-added
0.21
_added
0.20
Added
0.19
continued
0.18
ajout
0.18
Added
0.17
continued
0.17
Activations Density 0.030%