INDEX
Explanations
words that suggest strong emotions or social dynamics
New Auto-Interp
Negative Logits
ppard
-0.18
ayed
-0.17
Friedrich
-0.15
rint
-0.15
pper
-0.14
RNA
-0.14
aying
-0.14
adır
-0.14
ening
-0.14
IFF
-0.14
POSITIVE LOGITS
achs
0.17
.AddListener
0.16
νι
0.16
mai
0.15
alli
0.15
tam
0.14
ains
0.14
ìĽħ
0.14
ekim
0.14
.ActionListener
0.13
Activations Density 0.002%