INDEX
Explanations
phrases indicating inevitability or strong potential for something to happen
terms associated with inevitability and consequence
New Auto-Interp
Negative Logits
activation
-0.62
gdala
-0.60
throats
-0.59
eeks
-0.59
talk
-0.57
ynthesis
-0.57
AMES
-0.57
chat
-0.55
papers
-0.55
helicop
-0.55
POSITIVE LOGITS
ingly
0.93
iously
0.87
ibly
0.87
ously
0.84
uously
0.83
uably
0.78
ly
0.76
entimes
0.75
ensibly
0.75
ossal
0.74
Activations Density 0.264%