INDEX
Explanations
instances of personal introductions and conversational markers
New Auto-Interp
Negative Logits
ç²¾
-0.16
.Names
-0.15
íģ
-0.14
rollo
-0.14
_reporting
-0.14
baugh
-0.14
ervo
-0.14
IFY
-0.14
оÑĢ
-0.13
Confederate
-0.13
POSITIVE LOGITS
jet
0.15
ihn
0.14
062
0.14
astos
0.14
me
0.13
лÑĥÑĪ
0.13
Jet
0.13
bro
0.13
rire
0.13
otype
0.13
Activations Density 0.001%