INDEX
Explanations
the adjective "moderate" and related terms
references to moderation in various contexts
New Auto-Interp
Negative Logits
Lust
-0.72
STON
-0.70
hon
-0.70
borne
-0.69
arium
-0.69
stone
-0.69
yang
-0.69
RIP
-0.68
HCR
-0.68
GV
-0.68
POSITIVE LOGITS
erate
1.10
yip
0.93
minded
0.92
moderate
0.91
moderate
0.91
leaning
0.80
mble
0.78
medi
0.77
sized
0.77
»Ĵ
0.77
Activations Density 0.012%