INDEX
Explanations
references to superlatives or extremes
instances of the word "biggest" indicating significant or noteworthy events
New Auto-Interp
Negative Logits
ences
-0.77
heid
-0.75
ttes
-0.75
onto
-0.72
ious
-0.71
jad
-0.71
anism
-0.71
actions
-0.70
ENC
-0.70
isson
-0.69
POSITIVE LOGITS
gest
0.98
culprit
0.93
obstacle
0.90
hurdle
0.89
chunk
0.88
misconception
0.87
bang
0.87
takeaway
0.85
proponent
0.85
prize
0.82
Activations Density 0.028%