INDEX
Explanations
the word "so," and also matches some words ending in "ware" and "such"
New Auto-Interp
Negative Logits
so
-3.56
so
-1.88
så
-1.87
così
-1.78
так
-1.65
begitu
-1.52
sehingga
-1.28
כך
-1.27
如此
-1.23
niin
-1.23
POSITIVE LOGITS
Diſ
0.91
Reſ
0.91
ſche
0.88
pleaſure
0.87
Conſ
0.84
houſe
0.84
Efq
0.84
reaſon
0.84
ſtate
0.84
whoſe
0.81
Activations Density 2.377%