INDEX
Explanations
numerical quantities or counts
occurrences of the word "number" followed by numerical values
New Auto-Interp
Negative Logits
missible
-0.77
sein
-0.74
rador
-0.74
Despair
-0.73
idden
-0.71
anium
-0.69
rament
-0.66
separat
-0.66
Thou
-0.64
agate
-0.64
POSITIVE LOGITS
otom
0.71
plate
0.68
number
0.66
of
0.66
onement
0.66
encies
0.64
othe
0.63
666
0.62
thousand
0.62
metry
0.60
Activations Density 0.027%