INDEX
Explanations
introvert extrovert personality
New Auto-Interp
Negative Logits
थी
0.41
filth
0.41
ridiculous
0.39
distort
0.39
distortions
0.38
dangerous
0.37
resx
0.37
tar
0.36
dangerous
0.36
মিল
0.35
POSITIVE LOGITS
introvert
1.48
extro
1.21
Personality
1.10
Personality
1.10
personality
1.09
intro
1.05
Intro
1.00
personality
1.00
personalities
0.96
Intro
0.94
Activations Density 0.020%