INDEX
Explanations
names mentioned in text along with related descriptions
instances of the letter 'L'
New Auto-Interp
Negative Logits
democracy
-0.70
deterrent
-0.65
relevance
-0.64
behaviour
-0.64
deterrence
-0.63
Sud
-0.63
disadvant
-0.62
consequential
-0.62
fruitful
-0.62
uyomi
-0.60
POSITIVE LOGITS
ï¸ı
1.20
agree
0.81
hole
0.80
dro
0.78
treated
0.76
felt
0.76
ï¸
0.75
wrote
0.74
iced
0.74
cre
0.73
Activations Density 0.351%