INDEX
Explanations
references to a specific individual or personal identifier
New Auto-Interp
Negative Logits
stood
-0.68
stage
-0.67
managed
-0.64
lift
-0.64
locked
-0.64
ãĤ°
-0.63
tolerate
-0.62
ting
-0.61
chest
-0.61
lain
-0.60
POSITIVE LOGITS
ences
1.12
emi
0.98
ppo
0.94
oglu
0.92
otti
0.90
encia
0.89
ardo
0.89
zona
0.88
pe
0.88
ère
0.86
Activations Density 0.006%