INDEX
Explanations
instances of the word "many."
New Auto-Interp
Negative Logits
UAL
-0.75
anut
-0.75
icism
-0.71
OPE
-0.68
istan
-0.68
agame
-0.67
atum
-0.67
ften
-0.66
acus
-0.65
oulos
-0.65
POSITIVE LOGITS
facets
1.12
times
1.05
aspects
1.02
kinds
0.96
different
0.95
thousands
0.91
occasions
0.89
ways
0.88
unanswered
0.86
body
0.86
Activations Density 0.073%