INDEX
Explanations
words related to personal qualities or descriptions of people
references to specific individuals and their characteristics or actions
New Auto-Interp
Negative Logits
ramid
-0.83
sequent
-0.73
encies
-0.71
ð
-0.70
iners
-0.69
inav
-0.68
uates
-0.67
使
-0.66
negie
-0.66
rued
-0.65
POSITIVE LOGITS
brilliant
1.03
fearless
1.03
hilarious
1.01
charismatic
0.99
charming
0.99
adorable
0.99
terrific
0.97
handsome
0.96
gorgeous
0.95
fantastic
0.94
Activations Density 0.233%