INDEX
Explanations
instances of dialogue and conversational elements
New Auto-Interp
Negative Logits
ÑĢеÑģ
-0.14
affen
-0.14
ãĥ¼ãĥģ
-0.14
ãĥ³ãĥIJãĥ¼
-0.13
509
-0.13
mÃŃt
-0.13
اÙĩÙħ
-0.13
zp
-0.13
ergarten
-0.13
ackson
-0.13
POSITIVE LOGITS
notice
0.43
noticing
0.40
notices
0.39
noticed
0.39
noticed
0.35
Notice
0.33
notice
0.32
hear
0.30
Notice
0.30
see
0.29
Activations Density 0.317%