INDEX
Explanations
history, geography, and extent
New Auto-Interp
Negative Logits
0
0.50
pux
0.45
bestimmten
0.43
szok
0.43
تين
0.41
新的
0.41
ä
0.41
obat
0.41
やっぱり
0.41
副
0.40
POSITIVE LOGITS
throughout
0.73
Throughout
0.59
Throughout
0.56
분야
0.53
tutta
0.48
THRO
0.48
вси
0.48
всички
0.48
усі
0.46
всю
0.45
Activations Density 0.004%