INDEX
Explanations
negations or distinctions that emphasize what is not typically included or expected
New Auto-Interp
Negative Logits
kasarigan
-0.42
femininas
-0.39
sumpay
-0.36
illeur
-0.36
FBref
-0.35
oughby
-0.35
raiſ
-0.35
étoit
-0.35
volto
-0.35
almendras
-0.34
POSITIVE LOGITS
transfieras
0.59
brigens
0.54
="@+
0.50
OGND
0.50
GEBURTSDATUM
0.47
="#"><
0.47
theoremstyle
0.44
rungsseite
0.44
beginnetje
0.44
iprot
0.43
Activations Density 0.125%