INDEX
Explanations
repetitive phrases or statements emphasizing similarity or sameness
New Auto-Interp
Negative Logits
ly
-0.19
rious
-0.16
Own
-0.15
land
-0.15
ses
-0.15
iesta
-0.15
lio
-0.15
Certain
-0.15
ng
-0.14
ric
-0.14
POSITIVE LOGITS
-sex
0.43
thing
0.40
exact
0.28
kind
0.27
amount
0.26
sort
0.25
-old
0.24
basic
0.24
-sized
0.23
kinds
0.23
Activations Density 0.066%