INDEX
Explanations
adjectives conveying importance or severity
words associated with significance or urgency
New Auto-Interp
Negative Logits
arettes
-0.83
runners
-0.81
parents
-0.79
stories
-0.78
users
-0.78
ometers
-0.76
ubi
-0.76
Controls
-0.76
owers
-0.75
aneers
-0.74
POSITIVE LOGITS
endeavor
1.11
piece
1.05
feat
1.02
foray
1.01
distinction
1.00
thing
0.99
tale
0.99
scenario
0.97
beast
0.95
phenomenon
0.95
Activations Density 0.148%