INDEX
Explanations
words related to specific proper nouns, potentially names or places
abbreviations or acronyms, particularly those related to academic roles or organizations
New Auto-Interp
Negative Logits
PF
-0.75
gears
-0.70
igham
-0.69
ĸļ
-0.65
ĨĴ
-0.65
magnification
-0.64
peas
-0.61
constants
-0.60
ibaba
-0.59
levers
-0.59
POSITIVE LOGITS
ause
0.73
vantage
0.69
aline
0.68
ainment
0.66
anamo
0.66
ucl
0.65
alty
0.64
ublic
0.62
ITION
0.62
vernment
0.61
Activations Density 0.125%