INDEX
Explanations
expressions related to interpersonal relationships and social dynamics
New Auto-Interp
Negative Logits
-Bar
-0.16
AVA
-0.15
oub
-0.15
Äįer
-0.15
abei
-0.14
atis
-0.14
iske
-0.14
ÐĬ
-0.14
UiThread
-0.14
localize
-0.14
POSITIVE LOGITS
HOH
0.30
eviction
0.29
BB
0.27
ev
0.26
house
0.24
BB
0.23
alliance
0.22
House
0.21
bb
0.20
Big
0.20
Activations Density 0.001%