INDEX
Explanations
Internal consistency, you need
New Auto-Interp
Negative Logits
ozo
1.61
कुंठ
1.59
গাঁও
1.57
lL
1.55
compressibility
1.54
ی
1.52
س
1.45
ือบ
1.43
幾
1.42
adip
1.42
POSITIVE LOGITS
hate
1.60
banda
1.41
ungu
1.29
ve
1.29
wort
1.26
forall
1.25
ра
1.24
ditt
1.22
ob
1.22
hated
1.18
Activations Density 0.002%