INDEX
Explanations
fractional values, such as measurements in halves
occurrences of specific numerical values and names
New Auto-Interp
Negative Logits
rine
-0.92
alia
-0.81
ories
-0.78
opher
-0.77
utical
-0.77
rums
-0.77
alus
-0.75
ancy
-0.75
uating
-0.74
rina
-0.73
POSITIVE LOGITS
Ò
0.77
cffffcc
0.73
abouts
0.70
Flavoring
0.68
jud
0.68
uberty
0.68
hearts
0.67
WAYS
0.65
Mandela
0.65
xual
0.63
Activations Density 0.030%