INDEX
Explanations
the word "not" being emphasized in sentences
negations or expressions of denial
New Auto-Interp
Negative Logits
stakes
-0.69
éĥ
-0.68
Companies
-0.67
Presence
-0.65
ership
-0.65
itor
-0.65
Tours
-0.64
iers
-0.61
ãĤ¼
-0.61
Circuit
-0.61
POSITIVE LOGITS
icably
1.37
necessarily
1.28
icable
1.17
epad
1.14
hin
1.03
orious
0.95
exactly
0.93
yet
0.89
uncommon
0.88
eworthy
0.88
Activations Density 0.179%