INDEX
Explanations
instances of the word "Fr" followed by a number
references to the French language or culture
New Auto-Interp
Negative Logits
Twain
-0.87
eers
-0.82
eering
-0.67
Blazers
-0.66
Archdemon
-0.66
>[
-0.66
lihood
-0.65
6666
-0.65
eer
-0.64
ãĥŁ
-0.64
POSITIVE LOGITS
ictional
0.95
aternity
0.93
aternal
0.92
umpy
0.92
Fr
0.90
annie
0.89
inge
0.89
antz
0.88
Fr
0.87
atell
0.86
Activations Density 0.006%