INDEX
Explanations
references to representation or acting on someone's behalf
New Auto-Interp
Negative Logits
ULTY
-0.54
easily
-0.48
Ein
-0.47
-0.46
روز
-0.45
no
-0.44
Gus
-0.44
amak
-0.44
far
-0.43
cru
-0.43
POSITIVE LOGITS
Representing
1.51
mewakili
1.46
Represent
1.42
representing
1.42
Represent
1.40
représentants
1.39
representing
1.37
represent
1.36
representative
1.36
representative
1.35
Activations Density 0.178%