INDEX
Explanations
references to specific items or subjects mentioned in the text
New Auto-Interp
Negative Logits
Ceramby
-0.53
more
-0.53
mazoo
-0.50
どころ
-0.50
outside
-0.50
espan
-0.49
consider
-0.47
voll
-0.47
しか
-0.46
dziew
-0.45
POSITIVE LOGITS
this
1.44
this
1.43
هذه
1.39
THIS
1.37
dieses
1.33
THIS
1.32
these
1.31
este
1.31
ഈ
1.30
This
1.28
Activations Density 0.168%