INDEX
Explanations
phrases that express continuity or a consistent presence over time
New Auto-Interp
Negative Logits
Harding
-0.16
soon
-0.16
247
-0.16
ule
-0.15
ellar
-0.15
£
-0.15
ustum
-0.14
inks
-0.14
ook
-0.14
Wyn
-0.14
POSITIVE LOGITS
emek
0.15
ETO
0.15
aklı
0.15
argout
0.14
trh
0.14
edo
0.14
apesh
0.14
einz
0.14
strap
0.14
ninger
0.14
Activations Density 0.030%