INDEX
Explanations
personal opinions or beliefs expressed in texts
expressions of personal knowledge or certainty
New Auto-Interp
Negative Logits
luster
-0.82
moderation
-0.69
emale
-0.68
smashing
-0.65
flation
-0.62
Reneg
-0.62
gren
-0.61
festive
-0.60
interstitial
-0.60
Dro
-0.59
POSITIVE LOGITS
know
1.78
know
1.64
KNOW
1.64
knew
1.61
Know
1.60
knows
1.57
Know
1.55
knowing
1.42
Knowing
1.25
knowledge
1.24
Activations Density 0.285%