INDEX
Explanations
references to personal feelings and interactions
New Auto-Interp
Negative Logits
mourut
-0.83
geweest
-0.78
Enjoyed
-0.74
belonged
-0.68
Were
-0.67
进行了
-0.66
consisted
-0.66
existed
-0.65
wasnt
-0.65
提供了
-0.65
POSITIVE LOGITS
make
1.22
take
1.21
get
1.14
decide
1.04
come
1.04
give
1.00
go
0.99
try
0.99
bring
0.97
create
0.95
Activations Density 0.462%