INDEX
Explanations
references to the word "pol" with different suffixes
New Auto-Interp
Negative Logits
ACTED
-0.68
INESS
-0.67
rament
-0.64
LY
-0.64
rers
-0.64
uring
-0.61
Apprentice
-0.61
ORGE
-0.61
Revel
-0.60
------------
-0.59
POSITIVE LOGITS
anski
1.19
itely
1.16
ikarp
1.02
ander
1.00
anco
1.00
iov
1.00
tical
0.99
arity
0.95
atility
0.95
ski
0.94
Activations Density 0.038%