INDEX
Explanations
words related to physical structures or locations
words pertaining to social dynamics and interactions
New Auto-Interp
Negative Logits
Rabbit
-0.74
çĭ
-0.69
PH
-0.67
è£ħ
-0.64
stellar
-0.64
Bohem
-0.64
transitional
-0.63
Trin
-0.63
Human
-0.63
Chel
-0.62
POSITIVE LOGITS
acters
1.03
oldown
0.95
akery
0.93
ilities
0.92
ards
0.90
ourse
0.90
aunts
0.88
usters
0.88
ourses
0.88
essing
0.87
Activations Density 0.093%