INDEX
Explanations
references to intervention strategies and related terminology
New Auto-Interp
Negative Logits
Sams
-0.77
säker
-0.75
Sams
-0.74
rdata
-0.66
fres
-0.63
Fres
-0.63
Pleasure
-0.60
grunt
-0.60
Ama
-0.60
Trello
-0.59
POSITIVE LOGITS
intervention
1.22
Intervention
1.16
Intervention
1.15
interventions
1.14
intervention
1.12
Interventions
1.12
Interventions
1.03
Invo
0.99
intervene
0.98
intervenir
0.92
Activations Density 0.157%