INDEX
Explanations
phrases or statements that are being repeated or emphasized
the repetition of statements or claims
New Auto-Interp
Negative Logits
iop
-0.81
Torrent
-0.73
ipel
-0.73
RH
-0.73
mile
-0.70
onz
-0.69
opic
-0.69
oping
-0.68
Offline
-0.68
lio
-0.65
POSITIVE LOGITS
reiter
1.08
reaff
1.01
reiterated
0.88
affirmation
0.86
vows
0.84
reiterate
0.83
affirm
0.78
disav
0.71
underscores
0.70
repud
0.69
Activations Density 0.015%