INDEX
Explanations
words and phrases indicating quality or evaluation of objects or experiences
New Auto-Interp
Negative Logits
juan
-0.18
swe
-0.16
sweat
-0.16
plet
-0.16
itemprop
-0.16
Swe
-0.16
.criteria
-0.15
rb
-0.15
öm
-0.14
pler
-0.13
POSITIVE LOGITS
anchors
0.17
udden
0.16
ë²Į
0.16
mens
0.16
tem
0.16
TEM
0.15
ünchen
0.14
yk
0.14
'][]
0.14
оÑĢе
0.14
Activations Density 0.006%