INDEX
Explanations
dialogue sentences marked with punctuation at the end
New Auto-Interp
Negative Logits
ktop
-0.87
acan
-0.79
undai
-0.78
etz
-0.76
spons
-0.74
lege
-0.74
isconsin
-0.74
adelphia
-0.72
isite
-0.72
confir
-0.71
POSITIVE LOGITS
ITNESS
1.06
âĶĢâĶĢâĶĢâĶĢ
0.94
Reply
0.92
¯¯¯¯¯¯¯¯
0.86
Suddenly
0.85
--------------------------------------------------------
0.84
Reward
0.83
CONTIN
0.80
Tears
0.79
Plug
0.78
Activations Density 7.298%