INDEX
Explanations
proper nouns, specifically names of people
occurrences of the word "More" and its variations in different contexts
New Auto-Interp
Negative Logits
IPS
-0.68
oes
-0.66
Cro
-0.66
liest
-0.64
%]
-0.62
oresc
-0.62
ividual
-0.61
OCK
-0.61
IP
-0.61
keeping
-0.60
POSITIVE LOGITS
than
1.28
Than
0.93
importantly
0.91
HUD
0.84
likely
0.79
than
0.79
ened
0.78
closely
0.78
ado
0.78
models
0.74
Activations Density 0.102%