INDEX
Explanations
text unit followed by a category
New Auto-Interp
Negative Logits
jednotliv
0.41
尘
0.41
yesha
0.40
β
0.39
gow
0.39
minutes
0.38
Minutes
0.38
minutes
0.37
നാല
0.37
ွေး
0.37
POSITIVE LOGITS
বিশিষ্ট
0.72
বিশিষ্ট
0.67
affair
0.60
minimum
0.54
wonder
0.53
wonders
0.53
minimum
0.51
씩
0.50
Wonder
0.47
максимум
0.47
Activations Density 0.037%