INDEX
Explanations
model enthusiastic affirmation
New Auto-Interp
Negative Logits
fundador
0.98
σύμφωνα
0.92
генерал
0.86
десят
0.86
questa
0.85
इसका
0.85
ध्वस्त
0.84
ில்லியன்
0.84
fica
0.84
थ्री
0.84
POSITIVE LOGITS
important
0.79
مهم
0.75
important
0.74
중요
0.72
initiative
0.71
asy
0.69
preferences
0.69
लोकप्रियता
0.68
goals
0.65
goals
0.62
Activations Density 0.048%