INDEX
Explanations
words that convey authenticity and truthfulness or high value attributes
New Auto-Interp
Negative Logits
separ
-0.57
aza
-0.51
Orient
-0.49
Parse
-0.49
piram
-0.48
jungle
-0.47
Osi
-0.47
pisa
-0.47
Christiane
-0.47
ali
-0.46
POSITIVE LOGITS
viņ
0.54
berdayakan
0.47
vocês
0.47
graduación
0.46
lección
0.45
незавершена
0.45
lentejuelas
0.44
bermanfaat
0.44
pierna
0.44
Cheese
0.43
Activations Density 0.227%