INDEX
Explanations
words containing non-English characters, specifically umlauts and other accents
specific names and terms associated with individuals and locations
New Auto-Interp
Negative Logits
Debor
-0.83
Fired
-0.73
Reef
-0.73
Rhino
-0.69
scorp
-0.68
ORK
-0.67
Turtles
-0.66
BILITIES
-0.65
yrinth
-0.64
ombat
-0.64
POSITIVE LOGITS
ön
1.44
vironment
1.00
ning
0.85
ü
0.83
bach
0.81
ä
0.79
icht
0.79
kamp
0.77
agar
0.76
ollen
0.76
Activations Density 0.008%