INDEX
Explanations
beginnings of sentences or titles
New Auto-Interp
Negative Logits
for
-1.05
longstanding
-1.02
some
-1.00
kaik
-0.98
both
-0.92
excellent
-0.92
all
-0.91
only
-0.90
strong
-0.89
two
-0.85
POSITIVE LOGITS
ケーブル
0.97
doctrina
0.95
レイン
0.94
Jacobi
0.93
atuan
0.92
ウール
0.90
exposé
0.88
tuesday
0.87
permett
0.85
葯
0.85
Activations Density 0.004%