INDEX
Explanations
personal pronouns indicating actions or emotions directed towards oneself or others
New Auto-Interp
Negative Logits
Associated
-0.68
quartered
-0.66
Rousse
-0.66
Electrical
-0.63
Globe
-0.62
laughter
-0.61
Canaver
-0.61
understatement
-0.61
Stra
-0.61
holding
-0.60
POSITIVE LOGITS
've
1.10
reached
0.96
arrived
0.94
got
0.92
realise
0.92
finally
0.92
realize
0.91
're
0.90
realized
0.90
reaches
0.90
Activations Density 0.145%