INDEX
Explanations
expressions of opinion or judgment about value or worth
New Auto-Interp
Negative Logits
fres
-0.15
imet
-0.14
engo
-0.14
elp
-0.13
Casual
-0.13
closest
-0.13
enci
-0.13
preserve
-0.13
ariat
-0.13
hte
-0.13
POSITIVE LOGITS
meaning
0.23
divide
0.21
double
0.20
triple
0.19
splitting
0.19
dividing
0.19
meanings
0.18
mean
0.18
splits
0.18
facts
0.18
Activations Density 0.064%