INDEX
Explanations
the word "Con" and its variations, indicating a focus on content concerning consequences, conclusions, or conventions
New Auto-Interp
Negative Logits
arp
-0.15
ña
-0.15
ritt
-0.15
cellent
-0.15
ENDING
-0.14
ensive
-0.14
Console
-0.14
ypad
-0.14
folio
-0.14
entimes
-0.14
POSITIVE LOGITS
enen
0.19
aire
0.17
radi
0.16
exus
0.16
imbus
0.15
stan
0.15
tra
0.15
rig
0.14
prompt
0.14
nelly
0.14
Activations Density 0.028%