INDEX
Explanations
words indicating ongoing or repeated actions
New Auto-Interp
Negative Logits
oad
-0.16
erton
-0.16
lier
-0.16
liers
-0.15
赤
-0.14
acco
-0.14
ürlich
-0.13
talented
-0.13
asl
-0.13
akhir
-0.13
POSITIVE LOGITS
around
0.24
providing
0.19
around
0.19
Around
0.19
Around
0.19
voted
0.18
called
0.17
described
0.17
fixtures
0.16
active
0.16
Activations Density 0.043%