INDEX
Explanations
phrases expressing surprise or realization
phrases expressing surprise or disbelief
New Auto-Interp
Negative Logits
Dialogue
-0.81
adr
-0.72
ribut
-0.70
Rel
-0.69
gp
-0.64
ioxide
-0.63
utic
-0.62
verend
-0.61
rough
-0.60
bis
-0.60
POSITIVE LOGITS
beforehand
0.75
bothered
0.70
Saban
0.70
Bout
0.64
existed
0.63
myself
0.63
Kamp
0.63
intending
0.62
terday
0.62
spoiled
0.61
Activations Density 0.317%