INDEX
Explanations
communication and human connection
New Auto-Interp
Negative Logits
۔۔
0.50
s
0.47
predictor
0.46
esa
0.45
decision
0.45
definitely
0.44
c
0.43
it
0.42
Supporters
0.42
variável
0.41
POSITIVE LOGITS
Glue
0.60
Leadership
0.54
Combined
0.53
ihtiy
0.51
Orientation
0.51
intervento
0.50
נו
0.50
Interruptions
0.49
ゴ
0.49
מא
0.49
Activations Density 0.010%