INDEX
Explanations
takes courage and integrity
New Auto-Interp
Negative Logits
enthusiasm
0.88
enthusiastic
0.80
enthusi
0.77
enthous
0.73
entusiasmo
0.71
उत्सा
0.71
entusi
0.71
Enthusi
0.70
ambitious
0.68
friendly
0.67
POSITIVE LOGITS
Character
1.00
character
0.97
integrity
0.96
Character
0.94
character
0.93
integrity
0.88
karakter
0.88
Integrity
0.86
integridad
0.83
Charakter
0.77
Activations Density 0.028%