INDEX
Explanations
assertions and beliefs about various subjects
New Auto-Interp
Negative Logits
houſe
-0.84
Houſe
-0.82
Anſ
-0.70
itſelf
-0.69
Tikang
-0.69
pleaſure
-0.69
témoig
-0.69
ſta
-0.68
يتيمه
-0.67
ſont
-0.67
POSITIVE LOGITS
believes
0.60
believe
0.51
считает
0.46
felt
0.46
Felt
0.45
считают
0.41
setVerticalGroup
0.41
believed
0.40
think
0.39
Felt
0.38
Activations Density 0.549%