INDEX
Explanations
inflation, metaphors, maintain balance, mimicking historical
New Auto-Interp
Negative Logits
gon
0.37
has
0.37
eval
0.37
approval
0.37
actions
0.37
vh
0.37
herb
0.37
pled
0.36
version
0.36
period
0.36
POSITIVE LOGITS
ⵓ
0.46
孔
0.43
ອງ
0.43
🚄
0.42
любо
0.42
люби
0.41
bewe
0.39
瓮
0.39
وضع
0.39
підприєм
0.38
Activations Density 0.001%