INDEX
Explanations
instances of admiration and appreciation
New Auto-Interp
Negative Logits
odoxy
-0.15
ologic
-0.14
ender
-0.14
steam
-0.14
主義
-0.14
western
-0.14
/Dk
-0.13
ways
-0.13
ivate
-0.13
Ãłi
-0.13
POSITIVE LOGITS
egas
0.16
_CTL
0.15
738
0.15
ble
0.15
acle
0.15
ué
0.14
thora
0.14
ideographic
0.14
iors
0.14
ATA
0.14
Activations Density 0.008%