INDEX
Explanations
actions that involve receiving or accepting something
New Auto-Interp
Negative Logits
tat
-0.16
everywhere
-0.16
avir
-0.15
elif
-0.15
odon
-0.14
Affected
-0.14
tae
-0.14
KB
-0.14
üst
-0.14
cht
-0.13
POSITIVE LOGITS
hosting
0.17
Capacity
0.16
capacity
0.16
capacity
0.16
received
0.15
receive
0.15
incoming
0.15
_receive
0.15
arking
0.15
yne
0.15
Activations Density 0.267%