INDEX
Explanations
references to specific individuals and their statements or actions
New Auto-Interp
Negative Logits
supposedly
-0.90
reportedly
-0.89
apparently
-0.88
Apparently
-0.86
apparently
-0.85
allegedly
-0.82
Apparently
-0.81
according
-0.81
according
-0.81
menurut
-0.80
POSITIVE LOGITS
never
0.66
aldri
0.64
sengaja
0.61
kiệm
0.59
intention
0.59
之所以
0.56
nunca
0.56
aspetta
0.55
つもり
0.54
siente
0.53
Activations Density 0.303%