INDEX
Explanations
references to time and comparison between past and present states
New Auto-Interp
Negative Logits
ifen
-0.19
peria
-0.16
niž
-0.15
porno
-0.15
slaught
-0.15
mast
-0.14
ÑĨеÑĢ
-0.14
debit
-0.14
abstract
-0.14
olla
-0.14
POSITIVE LOGITS
agan
0.16
Interop
0.15
tle
0.14
éĶĻ
0.14
wick
0.14
those
0.14
beck
0.13
titre
0.13
hammer
0.13
Previous
0.13
Activations Density 0.043%