INDEX
Explanations
phrases that indicate contrasting ideas or conditions
New Auto-Interp
Negative Logits
adero
-0.14
ango
-0.14
237
-0.14
emmel
-0.14
ÙİÙĥ
-0.13
ei
-0.13
opp
-0.13
Reserved
-0.13
aucoup
-0.13
ahas
-0.13
POSITIVE LOGITS
Nor
0.16
nor
0.16
Nor
0.15
Prem
0.14
rig
0.14
onda
0.14
foy
0.14
etheless
0.14
briefing
0.14
subs
0.14
Activations Density 0.175%