INDEX
Explanations
mentions of something being "backed" or supported
words related to physical actions or states involving "up" or "down."
New Auto-Interp
Negative Logits
beit
-0.65
retri
-0.61
warr
-0.61
carriage
-0.60
scape
-0.59
PF
-0.59
trave
-0.58
Sm
-0.58
Siber
-0.57
ancest
-0.55
POSITIVE LOGITS
olicy
1.20
ublic
0.93
rison
0.87
inion
0.83
odcast
0.82
dates
0.80
utics
0.79
osition
0.78
pping
0.77
onent
0.76
Activations Density 0.018%