INDEX
Explanations
sentences where the author expresses personal thoughts or reflections
New Auto-Interp
Negative Logits
Vacc
-0.69
ids
-0.65
RAW
-0.65
deadliest
-0.64
abases
-0.63
erity
-0.63
Delete
-0.61
ACY
-0.61
uca
-0.58
cles
-0.58
POSITIVE LOGITS
than
1.20
than
1.13
Than
0.81
ceremonial
0.75
ventional
0.74
emphasis
0.73
oot
0.73
traditional
0.67
necessity
0.66
pragmatic
0.65
Activations Density 0.076%