INDEX
Explanations
sentences directed at or mentioning the listener
New Auto-Interp
Negative Logits
¥µ
-0.81
Ĥª
-0.80
¿½
-0.78
ĺħ
-0.76
entimes
-0.75
ĸļ
-0.74
enges
-0.73
20439
-0.72
EStream
-0.72
ĨĴ
-0.71
POSITIVE LOGITS
guys
1.43
're
1.27
yourselves
1.27
gentlemen
1.02
've
1.00
sir
0.99
tub
0.91
yourself
0.89
'll
0.88
bast
0.87
Activations Density 0.147%