INDEX
Explanations
repeated mentions of the word "one"
the word "one"
the word one
New Auto-Interp
Negative Logits
tably
-0.70
ly
-0.70
Isidro
-0.68
ſch
-0.67
dır
-0.64
ably
-0.64
fully
-0.61
București
-0.61
―――――
-0.58
Græ
-0.58
POSITIVE LOGITS
Ones
1.05
One
0.98
hundred
0.96
ONE
0.93
iric
0.93
Ones
0.92
estimés
0.90
One
0.88
-------------</
0.87
בודה
0.86
Activations Density 0.173%