INDEX
Explanations
actions described in the form "As you can see"
phrases indicating perception or observation
New Auto-Interp
Negative Logits
oleon
-0.75
pan
-0.72
rang
-0.70
lam
-0.69
istries
-0.67
anmar
-0.65
ocaly
-0.64
addons
-0.64
Deng
-0.63
wcs
-0.63
POSITIVE LOGITS
âĶĢ
0.72
ees
0.71
deduction
0.70
terday
0.69
(),
0.69
anecd
0.63
,.
0.61
.—
0.60
UME
0.60
,—
0.60
Activations Density 0.071%