INDEX
Explanations
repetitive phrases and statements that emphasize functions or actions
New Auto-Interp
Negative Logits
gage
-0.07
orgh
-0.06
arez
-0.06
rouch
-0.06
osit
-0.06
ander
-0.06
plor
-0.06
ress
-0.06
Paid
-0.06
ibble
-0.06
POSITIVE LOGITS
indeed
0.08
Indeed
0.07
undle
0.07
Justice
0.06
SG
0.06
yles
0.06
raud
0.06
znam
0.06
Rae
0.06
Indeed
0.06
Activations Density 0.022%