INDEX
Explanations
instances of the word "smart" and related concepts of intelligence
New Auto-Interp
Negative Logits
ĸļ
-0.79
artifacts
-0.74
dissatisf
-0.68
Reloaded
-0.66
Divinity
-0.64
Syndrome
-0.62
avored
-0.62
ãĥĺãĥ©
-0.60
riott
-0.60
stret
-0.59
POSITIVE LOGITS
guy
0.97
ctl
0.93
sonian
0.93
ness
0.90
ly
0.90
arse
0.89
ass
0.88
ling
0.86
found
0.86
asses
0.84
Activations Density 0.013%