INDEX
Explanations
items separated by commas or 'or'
New Auto-Interp
Negative Logits
decree
0.47
position
0.45
form
0.44
getter
0.44
law
0.43
world
0.42
ders
0.42
comprim
0.42
deci
0.42
process
0.41
POSITIVE LOGITS
“‘
0.94
「
0.91
%``
0.86
“
0.78
“
0.77
"'
0.77
「
0.75
`
0.74
"
0.73
"¿
0.72
Activations Density 0.366%