INDEX
Explanations
affirmative phrases indicating decisions or choices
New Auto-Interp
Negative Logits
ÏĢί
-0.16
486
-0.16
489
-0.15
rina
-0.14
ime
-0.14
hardt
-0.14
สมà¸ļ
-0.14
crow
-0.14
mine
-0.13
rita
-0.13
POSITIVE LOGITS
adf
0.17
iosper
0.15
odes
0.15
Responsibility
0.14
aths
0.14
Lama
0.14
Sür
0.14
alion
0.14
nieu
0.14
ony
0.14
Activations Density 0.034%