INDEX
Explanations
words indicating quantities or articles in various forms
New Auto-Interp
Negative Logits
houſe
-0.80
myſelf
-0.78
fumée
-0.76
poussière
-0.75
himſelf
-0.74
Phry
-0.72
Assyrian
-0.72
Majefty
-0.72
themſelves
-0.71
paille
-0.70
POSITIVE LOGITS
a
0.96
{}",0.95
large
0.90
]))
0.88
few
0.86
great
0.85
huge
0.84
"):
0.82
hundred
0.81
particular
0.80
Activations Density 0.009%