INDEX
Explanations
direct speech, quotations, and direct questions
dialogue and statements made by characters
New Auto-Interp
Negative Logits
£ı
-0.85
malink
-0.69
agos
-0.65
atures
-0.62
å§«
-0.61
wildfires
-0.61
"]=>
-0.61
unity
-0.61
utton
-0.60
ãĥĩ
-0.59
POSITIVE LOGITS
hello
0.91
aloud
0.77
politely
0.76
loudly
0.74
dayName
0.71
yip
0.71
goodbye
0.71
inviting
0.68
dissatisf
0.67
angrily
0.66
Activations Density 0.352%