INDEX
Explanations
positive expressions and sentiments
expressions of admiration and appreciation
New Auto-Interp
Negative Logits
erenn
-0.80
soever
-0.73
uria
-0.65
bable
-0.64
agues
-0.63
Else
-0.61
CENT
-0.60
istance
-0.59
operative
-0.59
predicate
-0.59
POSITIVE LOGITS
how
1.85
how
1.38
HOW
1.12
why
1.02
How
0.98
what
0.90
HOW
0.81
How
0.76
seeing
0.75
whether
0.75
Activations Density 0.467%