INDEX
Explanations
words related to physical activity and fitness
variations of the word "get" in different contexts
New Auto-Interp
Negative Logits
Palestin
-0.70
intest
-0.66
Ire
-0.64
obser
-0.64
defe
-0.63
charact
-0.62
distingu
-0.62
conduc
-0.62
departure
-0.61
livest
-0.61
POSITIVE LOGITS
TING
1.26
tin
1.24
ters
0.96
ti
0.92
tered
0.90
rics
0.87
ter
0.87
rov
0.86
emp
0.86
tes
0.86
Activations Density 0.022%