INDEX
Explanations
references to the concept of "empiricism" and aspects related to beauty
New Auto-Interp
Negative Logits
Reſ
-1.69
pleaſure
-1.61
Houſe
-1.61
doubtnut
-1.60
itſelf
-1.60
myſelf
-1.59
―――――
-1.55
Theſe
-1.52
ſelf
-1.51
ſeveral
-1.50
POSITIVE LOGITS
0.91
emp
0.81
,
0.80
.
0.77
y
0.71
emp
0.69
h
0.69
ny
0.68
'
0.68
(
0.68
Activations Density 0.432%