INDEX
Explanations
base for subsequent concepts
New Auto-Interp
Negative Logits
involving
0.91
بالنسبه
0.84
بالنسبة
0.82
nel
0.82
ift
0.81
위한
0.80
pertaining
0.79
charme
0.78
consisting
0.77
regarding
0.77
POSITIVE LOGITS
ভয়ে
0.80
چنین
0.80
или
0.77
CurrentByte
0.76
или
0.74
них
0.74
বললো
0.74
Updated
0.72
либо
0.72
comportamenti
0.72
Activations Density 0.076%