INDEX
Explanations
"very" followed by descriptive adjectives
New Auto-Interp
Negative Logits
ق
1.78
ب
1.75
ו
1.73
o
1.70
กว่า
1.61
بوت
1.59
د
1.57
ರ್ಧ
1.52
ות
1.51
ن
1.43
POSITIVE LOGITS
ᅱ
1.63
}$
1.55
ं
1.47
ﺮ
1.44
OOL
1.41
ı
1.41
day
1.40
ை
1.37
也非常
1.36
ru
1.34
Activations Density 0.108%