INDEX
Explanations
contractions of "it is" with high activations
pronominal references to the possessive form "its."
New Auto-Interp
Negative Logits
Thib
-0.80
rette
-0.73
ãĤ¹ãĥĪ
-0.68
eering
-0.67
Trend
-0.66
stad
-0.65
rum
-0.65
ij士
-0.65
roups
-0.64
ozy
-0.64
POSITIVE LOGITS
ELF
1.16
own
1.08
predecessor
0.89
elf
0.87
self
0.87
apparent
0.83
predecessors
0.82
sembly
0.82
respective
0.78
asca
0.78
Activations Density 0.091%