INDEX
Explanations
characteristics and features of items or entities, especially in a descriptive or analytical context
New Auto-Interp
Negative Logits
axe
-0.20
ers
-0.19
ott
-0.16
iej
-0.16
ensch
-0.16
owl
-0.16
agne
-0.16
omb
-0.15
avi
-0.15
emb
-0.15
POSITIVE LOGITS
s
0.38
t
0.34
tica
0.32
nge
0.29
ska
0.28
ï¸ı
0.28
tte
0.28
tal
0.27
sar
0.27
e
0.27
Activations Density 0.052%