INDEX
Explanations
requests to write about love
New Auto-Interp
Negative Logits
intellig
0.86
لها
0.86
\
0.80
land
0.79
cameras
0.78
lar
0.77
immers
0.77
لع
0.77
لية
0.75
los
0.74
POSITIVE LOGITS
IN
1.04
was
0.93
ン
0.92
AZ
0.91
Varan
0.88
certificate
0.87
ED
0.84
a
0.84
ET
0.82
salaryfrom
0.82
Activations Density 0.009%