INDEX
Explanations
phrases related to exertion or difficulty
phrases that indicate being challenged or under pressure
New Auto-Interp
Negative Logits
succession
-0.70
dynamics
-0.65
legitimacy
-0.60
occurrence
-0.59
originate
-0.59
characteristic
-0.58
CTR
-0.58
divergence
-0.57
VERTISEMENT
-0.57
spurious
-0.57
POSITIVE LOGITS
pressed
0.90
angering
0.79
icent
0.78
cerned
0.78
ped
0.75
eled
0.75
ivated
0.75
watching
0.74
nin
0.73
oing
0.73
Activations Density 0.483%