INDEX
Explanations
phrases indicating a degree of intensity or comparison
phrases indicating a degree of messiness or complexity
New Auto-Interp
Negative Logits
rats
-0.84
reys
-0.81
ħĭ
-0.80
metics
-0.77
pects
-0.76
rates
-0.76
uers
-0.75
oons
-0.74
ards
-0.74
atars
-0.74
POSITIVE LOGITS
luck
1.01
overlap
0.85
extra
0.76
mischief
0.74
trouble
0.73
elbow
0.73
angu
0.72
irony
0.72
misinformation
0.71
realism
0.71
Activations Density 0.044%