INDEX
Explanations
positive descriptors related to experiences
physical states and actions
New Auto-Interp
Negative Logits
Dasar
-0.30
verdaderas
-0.29
silêncio
-0.28
zarar
-0.27
medlemmer
-0.27
davran
-0.26
verdaderos
-0.26
sagesse
-0.26
goutte
-0.26
förs
-0.25
POSITIVE LOGITS
linawan
0.68
AndEndTag
0.65
niſſe
0.65
queſto
0.65
ftagPool
0.62
࿊
0.61
ſehen
0.60
geſch
0.60
➌
0.60
ſeinen
0.60
Activations Density 0.021%