INDEX
Explanations
positive adjectives followed by specific nouns
New Auto-Interp
Negative Logits
áis
0.91
probed
0.88
uration
0.84
໕
0.84
behaves
0.84
Fleurit
0.83
税込
0.83
kron
0.83
ೋನ್
0.82
prostat
0.82
POSITIVE LOGITS
way
0.82
idea
0.77
decline
0.75
atmosphere
0.70
presence
0.70
voice
0.70
companionship
0.66
man
0.65
woman
0.64
time
0.64
Activations Density 0.000%