INDEX
Explanations
adjectives related to human qualities and behavior, such as bravery, stupidity, sincerity, and hypocrisy
characteristics related to personal qualities or societal critiques
New Auto-Interp
Negative Logits
romeda
-0.69
ãĤ´ãĥ³
-0.68
itionally
-0.66
iris
-0.66
tainment
-0.64
nesium
-0.62
ado
-0.59
amera
-0.58
ãĤ¡
-0.57
ourt
-0.57
POSITIVE LOGITS
exhibited
1.12
inherent
1.06
displayed
0.99
evident
0.95
lessness
0.92
afforded
0.91
shown
0.89
exercised
0.84
of
0.83
required
0.82
Activations Density 0.169%