INDEX
Explanations
references to the word "banana" and its variations
New Auto-Interp
Negative Logits
barba
-0.80
ם
-0.79
Keating
-0.75
Turkey
-0.71
fuma
-0.70
Jennings
-0.70
Strom
-0.69
opat
-0.69
Bronnen
-0.68
chak
-0.67
POSITIVE LOGITS
Saxons
1.00
Wirt
0.95
defStyle
0.94
Radu
0.94
Irishman
0.94
Arad
0.93
Kuh
0.89
Tric
0.89
Niels
0.88
Stonehenge
0.87
Activations Density 2.250%