INDEX
Explanations
proper names of people and places
repeated references to a specific individual or entity
New Auto-Interp
Negative Logits
influential
-0.71
capacity
-0.68
mushroom
-0.68
fascination
-0.67
wartime
-0.66
unwanted
-0.66
incorpor
-0.65
contribution
-0.65
mobil
-0.65
Gaul
-0.64
POSITIVE LOGITS
ï¸ı
1.16
efe
1.09
cause
1.03
tis
0.94
nob
0.92
sung
0.92
STEM
0.90
sic
0.89
¯
0.89
endif
0.88
Activations Density 0.188%