INDEX
Explanations
works well or resonates most
New Auto-Interp
Negative Logits
m
0.48
bon
0.47
'
0.47
ne
0.46
change
0.46
h
0.46
0
0.46
’
0.44
to
0.43
PORT
0.43
POSITIVE LOGITS
encanta
0.54
やすい
0.51
emphatically
0.51
dearly
0.50
legjob
0.50
fortemente
0.50
banget
0.50
బాగా
0.50
hyvin
0.49
perfectamente
0.48
Activations Density 0.111%