INDEX
Explanations
dialogue quoted speech
quoted speech or dialogue
New Auto-Interp
Negative Logits
charm
-0.71
charms
-0.70
abandoning
-0.67
slapping
-0.66
favor
-0.66
bending
-0.66
vain
-0.65
imagination
-0.65
schedule
-0.65
neglect
-0.64
POSITIVE LOGITS
Freedom
0.91
We
0.89
CDC
0.84
Ax
0.81
I
0.80
Reward
0.80
Never
0.80
Companies
0.80
Ku
0.78
Je
0.78
Activations Density 0.207%