INDEX
Explanations
references to specific categories or characteristics
New Auto-Interp
Negative Logits
Économie
-0.57
rağmen
-0.56
normaux
-0.56
llorar
-0.55
respectivas
-0.54
toekomst
-0.52
supérieurs
-0.51
perfeita
-0.51
économies
-0.49
devriez
-0.49
POSITIVE LOGITS
kinds
1.00
specific
0.96
types
0.93
amount
0.93
particular
0.91
kind
0.88
kinds
0.79
特定
0.79
circumstances
0.78
particular
0.78
Activations Density 0.135%