INDEX
Explanations
phrases emphasizing the continual aspect of actions or features
phrases emphasizing consistency and reliability
New Auto-Interp
Negative Logits
ahime
-0.79
rig
-0.73
DF
-0.72
IDA
-0.71
iants
-0.69
Kardashian
-0.68
lations
-0.68
DAC
-0.68
SG
-0.68
åĤ
-0.68
POSITIVE LOGITS
theless
0.91
entimes
0.89
dreamed
0.79
appreciated
0.79
overlooked
0.78
lurking
0.76
evolving
0.75
fascinated
0.75
thereafter
0.74
bothered
0.74
Activations Density 0.037%