INDEX
Explanations
triggering phrases focusing on a specific subject matter within text
instances of a specific character or symbol that signifies a type of response or commentary
New Auto-Interp
Negative Logits
Windsor
-0.76
Yor
-0.70
Suzuki
-0.67
Rack
-0.65
Roose
-0.65
Photographer
-0.64
Laurent
-0.64
Belfast
-0.62
Showdown
-0.62
Xan
-0.62
POSITIVE LOGITS
agree
0.86
should
0.84
rely
0.84
ve
0.82
s
0.82
t
0.82
felt
0.81
were
0.81
¯¯
0.81
require
0.81
Activations Density 0.197%