INDEX
Explanations
adjectives and descriptive phrases that suggest contrast or complexity
New Auto-Interp
Negative Logits
ViewFeatures
-0.97
itſelf
-0.96
houſe
-0.93
purpoſe
-0.91
Houſe
-0.90
Jefus
-0.88
ſche
-0.85
Conſ
-0.84
pleaſure
-0.84
Efq
-0.83
POSITIVE LOGITS
nakalista
0.55
in
0.55
CWE
0.55
it
0.52
,
0.51
since
0.49
!
0.47
ex
0.46
best
0.46
for
0.46
Activations Density 0.511%