INDEX
Explanations
phrases emphasizing the concept of value, importance, or significance
New Auto-Interp
Negative Logits
maximum
-0.16
somehow
-0.16
.react
-0.15
maximum
-0.15
somewhere
-0.15
Maximum
-0.15
Maximum
-0.14
æŃ
-0.14
sez
-0.14
rave
-0.13
POSITIVE LOGITS
degree
0.20
weight
0.19
urg
0.19
detail
0.18
importance
0.18
amount
0.18
effort
0.17
pressure
0.17
benefit
0.16
influence
0.16
Activations Density 0.100%