INDEX
Explanations
principal components or reading comprehension
New Auto-Interp
Negative Logits
戞
0.80
戝
0.80
Aula
0.75
ursor
0.69
yny
0.67
纭
0.66
ائر
0.64
காவல்துற
0.64
맺
0.63
council
0.63
POSITIVE LOGITS
comp
3.68
COMP
3.61
Comp
3.59
Comp
3.51
COMP
3.40
comp
3.35
Kom
3.34
compon
3.29
component
3.24
컴
3.17
Activations Density 0.601%