INDEX
Explanations
phrases and words that indicate temporal references or sequence of events
invention, comparison, or specific treatments
New Auto-Interp
Negative Logits
featureID
-0.88
Италијани
-0.85
iſchen
-0.82
niſſe
-0.79
httphttps
-0.78
***!
-0.77
iſche
-0.77
Wikimedijinoj
-0.77
WireFormatLite
-0.75
ſſung
-0.75
POSITIVE LOGITS
vœ
0.42
désir
0.33
adjunto
0.32
derfor
0.32
mož
0.32
curieux
0.31
vábbi
0.31
therefore
0.31
Investigación
0.31
Espíritu
0.31
Activations Density 0.129%