INDEX
Explanations
instances of dialogue and conversational exchanges among characters
New Auto-Interp
Negative Logits
irr
-0.15
Wi
-0.15
pid
-0.15
Don
-0.15
Ðĭ
-0.14
dam
-0.14
assin
-0.14
edik
-0.14
unless
-0.14
don
-0.14
POSITIVE LOGITS
êµ°ìļĶ
0.19
ãģªãĤĭ
0.17
Ingen
0.15
orman
0.15
inar
0.15
wow
0.14
ãĥ¬ãĥ³
0.14
è³¢
0.14
anship
0.14
erville
0.14
Activations Density 0.213%