INDEX
Explanations
elements related to historical events and backgrounds of individuals, particularly in academia or sports
New Auto-Interp
Negative Logits
were
-0.25
Were
-0.22
Were
-0.22
weren
-0.21
were
-0.20
šli
-0.16
waren
-0.16
Booster
-0.16
Ñģказ
-0.15
itals
-0.15
POSITIVE LOGITS
ierte
0.35
gte
0.34
igte
0.33
pte
0.30
te
0.29
erte
0.28
zte
0.28
nte
0.27
kte
0.27
onte
0.27
Activations Density 0.025%