INDEX
Explanations
Kav, Kaw, or Cav followed by names
New Auto-Interp
Negative Logits
ňte
-0.82
할
-0.78
aliere
-0.69
🦵
-0.69
ńcz
-0.69
ramo
-0.69
辛
-0.69
chua
-0.69
خواه
-0.69
ы
-0.69
POSITIVE LOGITS
tius
0.85
/***
0.84
hept
0.77
chero
0.75
すじ
0.75
参り
0.75
shrinking
0.75
BT
0.74
"}";
0.74
疽
0.73
Activations Density 0.015%