INDEX
Explanations
proper nouns and technical terms related to news articles or scientific research
New Auto-Interp
Negative Logits
ifax
-0.48
usions
-0.46
ibur
-0.41
sing
-0.37
onies
-0.36
ingham
-0.36
scl
-0.36
aring
-0.35
rox
-0.35
awks
-0.35
POSITIVE LOGITS
BIL
0.48
KER
0.44
FORE
0.43
KE
0.42
GER
0.41
PRES
0.40
EG
0.38
Ger
0.38
ADE
0.36
KA
0.36
Activations Density 0.035%