INDEX
Explanations
occurrences of high numerical values associated with legal terminology or actions
New Auto-Interp
Negative Logits
Tikang
-0.93
ed
-0.87
ʺ
-0.78
tron
-0.72
ank
-0.72
Kw
-0.70
hoàng
-0.69
bank
-0.68
Oy
-0.68
cron
-0.65
POSITIVE LOGITS
↵↵↵
1.85
↵↵↵↵
1.48
↵↵↵↵↵↵
1.38
↵↵↵↵↵
1.35
↵↵↵↵↵↵↵
1.27
↵↵↵↵↵↵↵↵
1.20
にほんブログ村
1.17
↵↵↵↵↵↵↵↵↵↵↵↵
1.15
↵↵↵↵↵↵↵↵↵↵↵
1.15
↵↵↵↵↵↵↵↵↵
1.10
Activations Density 0.095%