INDEX
Explanations
phrases related to lists or enumerations
references to various unspecified items or concepts
New Auto-Interp
Negative Logits
irl
-0.71
inav
-0.71
strom
-0.68
oufl
-0.66
adia
-0.66
bern
-0.65
hom
-0.65
ECD
-0.64
oug
-0.64
sson
-0.63
POSITIVE LOGITS
happening
0.96
happens
0.92
happen
0.90
happened
0.87
things
0.85
happ
0.82
Things
0.81
Happ
0.77
facts
0.76
matters
0.76
Activations Density 0.037%