INDEX
Explanations
expressions about the significance or relevance of a concept or phenomenon
New Auto-Interp
Negative Logits
major
-0.40
Autoritní
-0.40
有力
-0.36
major
-0.34
叫
-0.33
funny
-0.31
openModal
-0.31
shiny
-0.30
studio
-0.30
덟
-0.30
POSITIVE LOGITS
importance
3.59
Importance
3.16
importance
3.11
Importance
3.06
importancia
2.67
importância
2.44
importanza
2.30
significance
2.23
importanza
2.13
Bedeutung
2.08
Activations Density 0.080%