INDEX
Explanations
phrases emphasizing the concept of possession or belonging
New Auto-Interp
Negative Logits
atel
-0.17
wed
-0.16
pol
-0.14
frame
-0.14
stan
-0.14
stump
-0.14
Kend
-0.14
fl
-0.14
alike
-0.14
frame
-0.13
POSITIVE LOGITS
ordes
0.16
icone
0.16
oru
0.16
UBLE
0.15
UPLE
0.15
uple
0.14
usk
0.14
518
0.14
enstein
0.14
kır
0.14
Activations Density 0.138%